Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes

Chengcheng Yan; Jiawei Xu; Zheng Peng; Qingsong Wang

arXiv:2508.04193·cs.LG·August 7, 2025

Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes

Chengcheng Yan, Jiawei Xu, Zheng Peng, Qingsong Wang

PDF

TL;DR

This paper introduces SAMT, a novel neural network training method that uses alternating block-wise updates with trainable, adaptive step sizes, leading to more stable and efficient training.

Contribution

The paper proposes SAMT, a new alternating minimization approach with meta-learned adaptive step sizes for improved neural network training stability and efficiency.

Findings

01

Achieves better generalization with fewer updates

02

Provides theoretical convergence guarantees

03

Outperforms state-of-the-art methods in benchmarks

Abstract

The training of deep neural networks is inherently a nonconvex optimization problem, yet standard approaches such as stochastic gradient descent (SGD) require simultaneous updates to all parameters, often leading to unstable convergence and high computational cost. To address these issues, we propose a novel method, Stochastic Alternating Minimization with Trainable Step Sizes (SAMT), which updates network parameters in an alternating manner by treating the weights of each layer as a block. By decomposing the overall optimization into sub-problems corresponding to different blocks, this block-wise alternating strategy reduces per-step computational overhead and enhances training stability in nonconvex settings. To fully leverage these benefits, inspired by meta-learning, we proposed a novel adaptive step size strategy to incorporate into the sub-problem solving steps of alternating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.