A second-order-like optimizer with adaptive gradient scaling for deep learning

Jérôme Bolte; Ryan Boustany; Edouard Pauwels; Andrei Purica

Preprints, Working Papers, ... Year : 2024

A second-order-like optimizer with adaptive gradient scaling for deep learning

(1) , (1, 2) , (1, 3) , (2)

1
2
3

Jérôme Bolte

Function : Author
PersonId : 960467

TSE-R Toulouse School of Economics – Recherche

Ryan Boustany

Function : Author
PersonId : 754850
IdHAL : ryan-boustany
ORCID : 0009-0005-2600-8822

TSE-R Toulouse School of Economics – Recherche

Thales LAS France

Edouard Pauwels

Function : Author
PersonId : 1387174

TSE-R Toulouse School of Economics – Recherche

Argumentation, Décision, Raisonnement, Incertitude et Apprentissage

Andrei Purica

Function : Author

Thales LAS France

Abstract

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second-order information and rescaling while keeping the memory requirements of standard DL methods as AdamW or SGD with momentum. After having recalled our geometrical motivations, we provide quite extensive experiments. On image classification (CIFAR-10, ImageNet) and language modeling (GPT-2), INNAprop consistently matches or outperforms AdamW both in training speed and accuracy, with minimal hyperparameter tuning in large-scale settings. Our code is publicly available at \url{https://github.com/innaprop/innaprop}.

Keywords

Deep learning Second-order methods Stochastic optimization Dynamical systems

Domains

Optimization and Control [math.OC] Artificial Intelligence [cs.AI]

Fichier principal

innaprop/innaprop_arxiv.pdf (971.87 Ko)

Origin	Files produced by the author(s)

Ryan Boustany : Connect in order to contact the contributor

https://hal.science/hal-04724894

Submitted on : Monday, October 7, 2024-7:39:29 PM

Last modification on : Thursday, October 10, 2024-3:17:51 AM

Dates and versions

hal-04724894 , version 1 (07-10-2024)

Identifiers

HAL Id : hal-04724894 , version 1
ARXIV : 2410.05871

Cite

Jérôme Bolte, Ryan Boustany, Edouard Pauwels, Andrei Purica. A second-order-like optimizer with adaptive gradient scaling for deep learning. 2024. ⟨hal-04724894⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UT1-CAPITOLE TDS-MACS INRAE IRIT IRIT-ADRIA ANR ANITI IRIT-IA INRAEOCCITANIETOULOUSE TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

0 View

0 Download

A second-order-like optimizer with adaptive gradient scaling for deep learning

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share