Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning
Yu Kobayashi, Hideaki Iiduka

TL;DR
This paper introduces a conjugate-gradient-based Adam algorithm that combines Adam with nonlinear conjugate gradient methods, demonstrating faster training of deep neural networks in fewer epochs through convergence analysis and experiments.
Contribution
It presents a novel optimization algorithm blending Adam with conjugate gradient methods, with proven convergence and improved training efficiency.
Findings
Fewer epochs needed for training deep neural networks.
Effective convergence properties demonstrated.
Outperforms existing adaptive stochastic optimizers.
Abstract
This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the proposed algorithm can train deep neural network models in fewer epochs than the existing adaptive stochastic optimization algorithms can.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Optimization Algorithms Research
MethodsAdam
