Preconditioned Inexact Stochastic ADMM for Deep Model
Shenglong Zhou, Ouya Wang, Ziyan Luo, Yongxu Zhu, Geoffrey Ye Li

TL;DR
This paper introduces PISA, a preconditioned inexact stochastic ADMM algorithm with theoretical guarantees, capable of effectively handling data heterogeneity in deep model training, and demonstrates superior performance over existing optimizers.
Contribution
The paper develops PISA, a novel stochastic ADMM algorithm with convergence guarantees under minimal assumptions, supporting scalable parallelism and various preconditioning strategies.
Findings
PISA converges under Lipschitz gradient assumption without additional conditions.
SISA and NSISA variants incorporate second-order and momentum preconditions.
Experimental results show superior performance of SISA and NSISA on diverse deep models.
Abstract
The recent advancement of foundation models (FMs) has brought about a paradigm shift, revolutionizing various sectors worldwide. The popular optimizers used to train these models are stochastic gradient descent-based algorithms, which face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. This paper develops an algorithm, PISA (Preconditioned Inexact Stochastic Alternating Direction Method of Multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Neural Network Applications · Advanced Memory and Neural Computing
MethodsPrIme Sample Attention
