Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes
Chengcheng Yan, Jiawei Xu, Zheng Peng, Qingsong Wang

TL;DR
This paper introduces SAMT, a novel neural network training method that uses alternating block-wise updates with trainable, adaptive step sizes, leading to more stable and efficient training.
Contribution
The paper proposes SAMT, a new alternating minimization approach with meta-learned adaptive step sizes for improved neural network training stability and efficiency.
Findings
Achieves better generalization with fewer updates
Provides theoretical convergence guarantees
Outperforms state-of-the-art methods in benchmarks
Abstract
The training of deep neural networks is inherently a nonconvex optimization problem, yet standard approaches such as stochastic gradient descent (SGD) require simultaneous updates to all parameters, often leading to unstable convergence and high computational cost. To address these issues, we propose a novel method, Stochastic Alternating Minimization with Trainable Step Sizes (SAMT), which updates network parameters in an alternating manner by treating the weights of each layer as a block. By decomposing the overall optimization into sub-problems corresponding to different blocks, this block-wise alternating strategy reduces per-step computational overhead and enhances training stability in nonconvex settings. To fully leverage these benefits, inspired by meta-learning, we proposed a novel adaptive step size strategy to incorporate into the sub-problem solving steps of alternating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
