Numerical influence of ReLU’(0) on backpropagation

David Bertoin; Jérôme Bolte; Sébastien Gerchinovitz; Edouard Pauwels

Communication Dans Un Congrès Année : 2021

Numerical influence of ReLU’(0) on backpropagation

Influence de ReLU'(0) sur la backpropagation

(1, 2) , (3) , (1, 4) , (5)

1
2
3
4
5

David Bertoin

Fonction : Auteur
PersonId : 750118
IdHAL : bertoin-david

IRT Saint Exupéry - Institut de Recherche Technologique

Institut Supérieur de l'Aéronautique et de l'Espace

Jérôme Bolte

Fonction : Auteur
PersonId : 995617

Toulouse School of Economics

Sébastien Gerchinovitz

Fonction : Auteur
PersonId : 12754
IdHAL : sebastien-gerchinovitz
IdRef : 156515776

IRT Saint Exupéry - Institut de Recherche Technologique

Institut de Mathématiques de Toulouse UMR5219

Edouard Pauwels

Fonction : Auteur
PersonId : 12830
IdHAL : edouard-pauwels
ORCID : 0000-0002-8180-075X

Argumentation, Décision, Raisonnement, Incertitude et Apprentissage

Résumé

In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)’s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

Impact_of_ReLU_prime.pdf (1005.21 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bertoin David : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03265059

Soumis le : mardi 29 juin 2021-11:01:31

Dernière modification le : lundi 18 mars 2024-10:24:07

Dates et versions

hal-03265059 , version 1 (22-06-2021)

hal-03265059 , version 2 (29-06-2021)

hal-03265059 , version 3 (18-10-2023)

Identifiants

HAL Id : hal-03265059 , version 2

Citer

David Bertoin, Jérôme Bolte, Sébastien Gerchinovitz, Edouard Pauwels. Numerical influence of ReLU’(0) on backpropagation. Advances in Neural Information Processing Systems, Dec 2021, Paris, France. ⟨hal-03265059v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

346 Consultations

995 Téléchargements

Numerical influence of ReLU’(0) on backpropagation

Influence de ReLU'(0) sur la backpropagation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager