Supervised Batch Normalization
Bilal Faye, Mustapha Lebbah, Hanane Azzag

TL;DR
Supervised Batch Normalization (SBN) improves neural network training by explicitly identifying data modes for normalization, leading to significant accuracy gains across various tasks and datasets.
Contribution
The paper introduces Supervised Batch Normalization, a novel method that categorizes data into modes for more effective normalization, outperforming traditional BN in diverse scenarios.
Findings
15.13% accuracy boost on CIFAR-100 with Vision Transformer
22.25% accuracy improvement on MNIST and SVHN in domain adaptation
Supervised BN outperforms standard BN and other normalization techniques
Abstract
Batch Normalization (BN), a widely-used technique in neural networks, enhances generalization and expedites training by normalizing each mini-batch to the same mean and variance. However, its effectiveness diminishes when confronted with diverse data distributions. To address this challenge, we propose Supervised Batch Normalization (SBN), a pioneering approach. We expand normalization beyond traditional single mean and variance parameters, enabling the identification of data modes prior to training. This ensures effective normalization for samples sharing common features. We define contexts as modes, categorizing data with similar characteristics. These contexts are explicitly defined, such as domains in domain adaptation or modalities in multimodal systems, or implicitly defined through clustering algorithms based on data similarity. We illustrate the superiority of our approach over…
Peer Reviews
Decision·Submitted to ICLR 2025
* SBN effectively addresses limitations of traditional BN in handling heterogeneous data, offering a solution applicable across domains with varied data distributions. * Experiments illustrate substantial performance gains, with SBN consistently outperforming other normalization methods on classification and domain adaptation tasks. * The method integrates seamlessly with standard deep learning frameworks, making it accessible for practical implementation in various tasks.
* The process for constructing contexts is unclear and lacks detailed methodologies. * The underlying principles behind SBN are insufficiently explained, making the approach less convincing. * Using k-means clustering to define contexts in large-scale datasets demands substantial computation, potentially prolonging training time, and lacks experimental support in this paper. * The novelty of SBN may be limited, as it could be perceived as a specific case of Mixture Normalization (MN) or Mode Nor
- The issue of batch normalization under diverse data distributions is an important problem. - The idea of context group for batch normalization is interesting. While the way to get the context group seems trival.
- During inference, when the contexts are not known, the mean value of statistics from all context groups is used, which is inconsistent with the training process. The inconsistency between training and testing should also be considered and might be more important than the gap among different context groups. - In the experiments on CIFAR, k-means clustering is used to obtain the context group. While k-means cannot perform highly accurate classification, it is unclear if the quality of the conte
1. Clear presentation 2. Significant improvement on baselines
1. Technical novelty is limited. The core algorithm of the SBN has been discussed in mixture BN. The innovation of this article lies only in using prior information to replace the original clustering center. 2. The experiments are unfair. For the supervised classification task, e.g., cifar100, SBN provides the superclass classification but the compared methods do not. 3. Some conclusions have not been fully substantiated. Like in Section 4, the authors claim "increasing the number of contexts
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Neural Networks and Applications · Advanced Control Systems Optimization
MethodsAttention Is All You Need · Byte Pair Encoding · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings · Softmax · Layer Normalization
