Temperature Balancing, Layer-wise Weight Analysis, and Neural Network   Training

Yefan Zhou; Tianyu Pang; Keqin Liu; Charles H. Martin; Michael W.; Mahoney; Yaoqing Yang

arXiv:2312.00359·cs.LG·December 4, 2023·2 cites

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W., Mahoney, Yaoqing Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces TempBalance, a layer-wise learning rate method inspired by temperature concepts and HT-SR theory, which improves neural network training performance across multiple datasets and architectures.

Contribution

It proposes TempBalance, a novel layer-wise learning rate scheduling method based on HT-SR theory, enhancing training effectiveness and outperforming existing optimizers and schedulers.

Findings

01

TempBalance improves test performance on CIFAR10, CIFAR100, SVHN, TinyImageNet.

02

It outperforms SGD, spectral norm regularization, and state-of-the-art optimizers.

03

Layer-wise temperature balancing leads to better model generalization.

Abstract

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yefanzhou/tempbalance
pytorchOfficial

Videos

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent