Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
Shuai Zheng, James T. Kwok

TL;DR
This paper introduces blockwise adaptive gradient descent, which balances adaptivity and generalization, leading to faster training and better generalization in deep learning compared to coordinate-wise methods.
Contribution
It proposes a novel blockwise adaptive stepsize method, providing theoretical convergence and stability analysis, and demonstrates improved empirical performance over Adam and Nesterov's method.
Findings
Faster convergence than Adam and Nesterov's accelerated gradient.
Lower generalization error due to reduced adaptivity aggressiveness.
Theoretically comparable convergence rate with improved stability.
Abstract
Stochastic methods with coordinate-wise adaptive stepsize (such as RMSprop and Adam) have been widely used in training deep neural networks. Despite their fast convergence, they can generalize worse than stochastic gradient descent. In this paper, by revisiting the design of Adagrad, we propose to split the network parameters into blocks, and use a blockwise adaptive stepsize. Intuitively, blockwise adaptivity is less aggressive than adaptivity to individual coordinates, and can have a better balance between adaptivity and generalization. We show theoretically that the proposed blockwise adaptive gradient descent has comparable convergence rate as its counterpart with coordinate-wise adaptive stepsize, but is faster up to some constant. We also study its uniform stability and show that blockwise adaptivity can lead to lower generalization error than coordinate-wise adaptivity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsAdam · RMSProp
