Numerical influence of ReLU'(0) on backpropagation

David Bertoin (ISAE-SUPAERO); J\'er\^ome Bolte (TSE-R); S\'ebastien; Gerchinovitz (IMT); Edouard Pauwels (IRIT-ADRIA)

arXiv:2106.12915·cs.LG·November 6, 2023·1 cites

Numerical influence of ReLU'(0) on backpropagation

David Bertoin (ISAE-SUPAERO), J\'er\^ome Bolte (TSE-R), S\'ebastien, Gerchinovitz (IMT), Edouard Pauwels (IRIT-ADRIA)

PDF

Open Access 1 Repo 1 Video

TL;DR

This study investigates how the derivative of ReLU at zero affects backpropagation and training across different precisions, revealing significant impacts at lower precisions and suggesting potential for parameter tuning.

Contribution

It demonstrates the influence of ReLU'(0) on training outcomes at various precisions and shows that common practices can be optimized by tuning this parameter.

Findings

01

ReLU'(0) significantly affects backpropagation at 16-bit precision.

02

Choosing ReLU'(0) = 0 improves training efficiency and accuracy.

03

Buffering effects of batch normalization and ADAM reduce ReLU'(0) influence.

Abstract

In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AnonymousReLU/ReLU_prime
pytorchOfficial

Videos

Numerical influence of ReLU’(0) on backpropagation· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques

MethodsDropout · Max Pooling · Convolution · Stochastic Gradient Descent · Softmax · Dense Connections · Adam