On the Ideal Number of Groups for Isometric Gradient Propagation
Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim

TL;DR
This paper proposes a theoretically grounded, architecture-aware method to determine the optimal number of groups in group normalization, improving training stability and performance across various neural network architectures and tasks.
Contribution
It introduces a novel method to set the number of groups based on gradient behavior, reducing the need for trial-and-error hyperparameter tuning.
Findings
Improved training stability and performance across multiple architectures.
Theoretical derivation of the ideal number of groups based on gradient calibration.
Effective layer-wise setting of group numbers for diverse networks.
Abstract
Recently, various normalization layers have been proposed to stabilize the training of deep neural networks. Among them, group normalization is a generalization of layer normalization and instance normalization by allowing a degree of freedom in the number of groups it uses. However, to determine the optimal number of groups, trial-and-error-based hyperparameter tuning is required, and such experiments are time-consuming. In this study, we discuss a reasonable method for setting the number of groups. First, we find that the number of groups influences the gradient behavior of the group normalization layer. Based on this observation, we derive the ideal number of groups, which calibrates the gradient scale to facilitate gradient descent optimization. Our proposed number of groups is theoretically grounded, architecture-aware, and can provide a proper value in a layer-wise manner for all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsInstance Normalization · Group Normalization · Layer Normalization
