TL;DR
NysAct is a scalable preconditioning method that uses Nystrom approximation to improve optimization efficiency and generalization in neural networks, balancing the benefits of first- and second-order methods.
Contribution
It introduces NysAct, a novel preconditioning technique leveraging Nystrom approximation for efficient second-order-like optimization.
Findings
Achieves higher test accuracy than first- and second-order methods.
Reduces computational and memory costs compared to existing second-order methods.
Maintains minimal impact on test accuracy while improving efficiency.
Abstract
Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NysAct, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NysAct leverages an eigenvalue-shifted Nystrom method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NysAct not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
