Investigating the interaction between gradient-only line searches and different activation functions
D. Kafka, Daniel. N. Wilke

TL;DR
This paper explores how gradient-only line searches (GOLS) interact with various activation functions in neural networks, revealing robustness issues with ReLU and solutions via architectural modifications like batch normalization and skip connections.
Contribution
It investigates the impact of activation function properties on GOLS performance and demonstrates how architectural features can mitigate associated training difficulties.
Findings
GOLS are generally robust across activation functions.
ReLU's zero-gradient region causes sparsity issues affecting training.
Batch normalization and skip connections improve GOLS effectiveness with ReLU.
Abstract
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions resulting from dynamic mini-batch sub-sampling in neural network training. Step sizes in GOLS are determined by localizing Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs) along descent directions. These are identified by a sign change in the directional derivative from negative to positive along a descent direction. Activation functions are a significant component of neural network architectures as they introduce non-linearities essential for complex function approximations. The smoothness and continuity characteristics of the activation functions directly affect the gradient characteristics of the loss function to be optimized. Therefore, it is of interest to investigate the relationship between activation functions and different neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Infrared Target Detection Methodologies
MethodsBatch Normalization
