SAGRAD: A Program for Neural Network Training with Simulated Annealing   and the Conjugate Gradient Method

Javier Bernal; Jose Torres-Jimenez

arXiv:2502.00112·cs.LG·February 4, 2025

SAGRAD: A Program for Neural Network Training with Simulated Annealing and the Conjugate Gradient Method

Javier Bernal, Jose Torres-Jimenez

PDF

TL;DR

SAGRAD is a Fortran 77 program that combines simulated annealing and a scaled conjugate gradient method to efficiently train neural networks for classification, addressing local minima issues.

Contribution

The paper introduces SAGRAD, a novel neural network training program that integrates simulated annealing with a scaled conjugate gradient algorithm for improved optimization.

Findings

01

Effective in avoiding local minima during training

02

Demonstrated on two classification datasets

03

Combines gradient computation with stochastic reinitialization

Abstract

SAGRAD (Simulated Annealing GRADient), a Fortran 77 program for computing neural networks for classification using batch learning, is discussed. Neural network training in SAGRAD is based on a combination of simulated annealing and M{\o}ller's scaled conjugate gradient algorithm, the latter a variation of the traditional conjugate gradient method, better suited for the nonquadratic nature of neural networks. Different aspects of the implementation of the training process in SAGRAD are discussed, such as the efficient computation of gradients and multiplication of vectors by Hessian matrices that are required by M{\o}ller's algorithm; the (re)initialization of weights with simulated annealing required to (re)start M{\o}ller's algorithm the first time and each time thereafter that it shows insufficient progress in reaching a possibly local minimum; and the use of simulated annealing when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.