Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu

TL;DR
This paper introduces H-Fac, a memory-efficient adaptive optimizer based on Hamiltonian dynamics, reducing memory overhead in deep learning training while maintaining strong performance across various architectures.
Contribution
H-Fac employs a novel rank-1 factorization approach for adaptive optimization, significantly lowering memory costs with theoretical convergence guarantees.
Findings
H-Fac achieves sublinear memory costs compared to traditional optimizers.
H-Fac maintains competitive performance across multiple neural network architectures.
Theoretical analysis confirms convergence properties of H-Fac.
Abstract
Modern deep learning heavily depends on adaptive optimizers such as Adam and its variants, which are renowned for their capacity to handle model scaling and streamline hyperparameter tuning. However, these algorithms typically experience high memory overhead caused by the accumulation of optimization states, leading to a critical challenge in training large-scale network models. In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level while maintaining competitive performance across a wide range of architectures. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Advanced Neural Network Applications
MethodsAdam
