General Cyclical Training of Neural Networks

Leslie N. Smith

arXiv:2202.08835·cs.LG·January 17, 2025

General Cyclical Training of Neural Networks

Leslie N. Smith

PDF

Open Access 1 Repo

TL;DR

This paper introduces the principle of general cyclical training in neural networks, emphasizing cyclical variations in training parameters and methods to improve performance, with practical techniques and experimental validation.

Contribution

It defines the general cyclical training concept and proposes novel cyclical techniques like weight decay, batch size, and loss functions, demonstrating their benefits.

Findings

01

Cyclical weight decay improves test accuracy.

02

Cyclical softmax temperature enhances model performance.

03

Cyclical gradient clipping benefits training stability.

Abstract

This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lnsmith54/cfl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsSoftmax · Gradient Clipping