General Cyclical Training of Neural Networks
Leslie N. Smith

TL;DR
This paper introduces the principle of general cyclical training in neural networks, emphasizing cyclical variations in training parameters and methods to improve performance, with practical techniques and experimental validation.
Contribution
It defines the general cyclical training concept and proposes novel cyclical techniques like weight decay, batch size, and loss functions, demonstrating their benefits.
Findings
Cyclical weight decay improves test accuracy.
Cyclical softmax temperature enhances model performance.
Cyclical gradient clipping benefits training stability.
Abstract
This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsSoftmax · Gradient Clipping
