Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W., Mahoney, Yaoqing Yang

TL;DR
This paper introduces TempBalance, a layer-wise learning rate method inspired by temperature concepts and HT-SR theory, which improves neural network training performance across multiple datasets and architectures.
Contribution
It proposes TempBalance, a novel layer-wise learning rate scheduling method based on HT-SR theory, enhancing training effectiveness and outperforming existing optimizers and schedulers.
Findings
TempBalance improves test performance on CIFAR10, CIFAR100, SVHN, TinyImageNet.
It outperforms SGD, spectral norm regularization, and state-of-the-art optimizers.
Layer-wise temperature balancing leads to better model generalization.
Abstract
Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent
