Conjugate-gradient-based Adam for stochastic optimization and its   application to deep learning

Yu Kobayashi; Hideaki Iiduka

arXiv:2003.00231·math.OC·March 4, 2020·5 cites

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

Yu Kobayashi, Hideaki Iiduka

PDF

Open Access

TL;DR

This paper introduces a conjugate-gradient-based Adam algorithm that combines Adam with nonlinear conjugate gradient methods, demonstrating faster training of deep neural networks in fewer epochs through convergence analysis and experiments.

Contribution

It presents a novel optimization algorithm blending Adam with conjugate gradient methods, with proven convergence and improved training efficiency.

Findings

01

Fewer epochs needed for training deep neural networks.

02

Effective convergence properties demonstrated.

03

Outperforms existing adaptive stochastic optimizers.

Abstract

This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the proposed algorithm can train deep neural network models in fewer epochs than the existing adaptive stochastic optimization algorithms can.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Optimization Algorithms Research

MethodsAdam