Traversing the noise of dynamic mini-batch sub-sampled loss functions: A visual guide
Dominic Kafka, Daniel Wilke

TL;DR
This paper investigates the challenges of mini-batch sub-sampling in neural network training, proposing the use of stochastic non-negative associated gradient projection points (SNN-GPPs) to improve optimization amidst discontinuities.
Contribution
It introduces the SNN-GPP criterion as a robust alternative to critical points for optimization in sub-sampled loss functions, supported by visual analysis.
Findings
SNN-GPPs are less affected by sub-sampling discontinuities than critical points.
Line searches targeting SNN-GPPs can enhance automation in neural network training.
SNN-GPPs better approximate true optima, especially with smooth, high-curvature activation functions.
Abstract
Mini-batch sub-sampling in neural network training is unavoidable, due to growing data demands, memory-limited computational resources such as graphical processing units (GPUs), and the dynamics of on-line learning. In this study we specifically distinguish between static mini-batch sub-sampled loss functions, where mini-batches are intermittently fixed during training, resulting in smooth but biased loss functions; and the dynamic sub-sampling equivalent, where new mini-batches are sampled at every loss evaluation, trading bias for variance in sampling induced discontinuities. These render automated optimization strategies such as minimization line searches ineffective, since critical points may not exist and function minimizers find spurious, discontinuity induced minima. This paper suggests recasting the optimization problem to find stochastic non-negative associated gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
