Adaptive Strategies in Non-convex Optimization
Zhenxun Zhuang

TL;DR
This paper develops adaptive algorithms for non-convex optimization that automatically adjust to unknown noise levels, gradient scales, and relaxed smoothness conditions, improving convergence and performance in deep learning tasks.
Contribution
It introduces noise-adaptive, scale-free, and relaxed-smoothness algorithms that do not require prior knowledge of problem parameters, with theoretical guarantees and empirical validation.
Findings
Noise-adaptive algorithms achieve near-optimal rates without prior noise scale knowledge.
Scale-free algorithms outperform traditional methods in neural network training.
Generalized SignSGD matches or exceeds the performance of Adam and other optimizers.
Abstract
An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications
MethodsGradient Clipping · Stochastic Gradient Descent · Adam
