Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks
Naoki Sato, Koshiro Izumi, Hideaki Iiduka

TL;DR
This paper introduces a scaled conjugate gradient method that accelerates training of deep neural networks, achieving faster convergence and better performance in tasks like image classification and GAN training.
Contribution
It proposes a novel scaled conjugate gradient algorithm with theoretical convergence guarantees and practical improvements over existing adaptive methods.
Findings
Faster minimization of training loss in image and text classification.
Achieved lowest Frechet inception distance in GAN training.
Proven convergence to stationary points with different learning rates.
Abstract
A scaled conjugate gradient method that accelerates existing adaptive methods utilizing stochastic gradients is proposed for solving nonconvex optimization problems with deep neural networks. It is shown theoretically that, whether with constant or diminishing learning rates, the proposed method can obtain a stationary point of the problem. Additionally, its rate of convergence with diminishing learning rates is verified to be superior to that of the conjugate gradient method. The proposed method is shown to minimize training loss functions faster than the existing adaptive methods in practical applications of image and text classification. Furthermore, in the training of generative adversarial networks, one version of the proposed method achieved the lowest Frechet inception distance score among those of the adaptive methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Face and Expression Recognition
