Supervised Batch Normalization

Bilal Faye; Mustapha Lebbah; Hanane Azzag

arXiv:2405.17027·cs.LG·May 28, 2024

Supervised Batch Normalization

Bilal Faye, Mustapha Lebbah, Hanane Azzag

PDF

Open Access 3 Reviews

TL;DR

Supervised Batch Normalization (SBN) improves neural network training by explicitly identifying data modes for normalization, leading to significant accuracy gains across various tasks and datasets.

Contribution

The paper introduces Supervised Batch Normalization, a novel method that categorizes data into modes for more effective normalization, outperforming traditional BN in diverse scenarios.

Findings

01

15.13% accuracy boost on CIFAR-100 with Vision Transformer

02

22.25% accuracy improvement on MNIST and SVHN in domain adaptation

03

Supervised BN outperforms standard BN and other normalization techniques

Abstract

Batch Normalization (BN), a widely-used technique in neural networks, enhances generalization and expedites training by normalizing each mini-batch to the same mean and variance. However, its effectiveness diminishes when confronted with diverse data distributions. To address this challenge, we propose Supervised Batch Normalization (SBN), a pioneering approach. We expand normalization beyond traditional single mean and variance parameters, enabling the identification of data modes prior to training. This ensures effective normalization for samples sharing common features. We define contexts as modes, categorizing data with similar characteristics. These contexts are explicitly defined, such as domains in domain adaptation or modalities in multimodal systems, or implicitly defined through clustering algorithms based on data similarity. We illustrate the superiority of our approach over…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 5

Strengths

* SBN effectively addresses limitations of traditional BN in handling heterogeneous data, offering a solution applicable across domains with varied data distributions. * Experiments illustrate substantial performance gains, with SBN consistently outperforming other normalization methods on classification and domain adaptation tasks. * The method integrates seamlessly with standard deep learning frameworks, making it accessible for practical implementation in various tasks.

Weaknesses

* The process for constructing contexts is unclear and lacks detailed methodologies. * The underlying principles behind SBN are insufficiently explained, making the approach less convincing. * Using k-means clustering to define contexts in large-scale datasets demands substantial computation, potentially prolonging training time, and lacks experimental support in this paper. * The novelty of SBN may be limited, as it could be perceived as a specific case of Mixture Normalization (MN) or Mode Nor

Reviewer 02Rating 3Confidence 4

Strengths

- The issue of batch normalization under diverse data distributions is an important problem. - The idea of context group for batch normalization is interesting. While the way to get the context group seems trival.

Weaknesses

- During inference, when the contexts are not known, the mean value of statistics from all context groups is used, which is inconsistent with the training process. The inconsistency between training and testing should also be considered and might be more important than the gap among different context groups. - In the experiments on CIFAR, k-means clustering is used to obtain the context group. While k-means cannot perform highly accurate classification, it is unclear if the quality of the conte

Reviewer 03Rating 3Confidence 4

Strengths

1. Clear presentation 2. Significant improvement on baselines

Weaknesses

1. Technical novelty is limited. The core algorithm of the SBN has been discussed in mixture BN. The innovation of this article lies only in using prior information to replace the original clustering center. 2. The experiments are unfair. For the supervised classification task, e.g., cifar100, SBN provides the superclass classification but the compared methods do not. 3. Some conclusions have not been fully substantiated. Like in Section 4, the authors claim "increasing the number of contexts

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Neural Networks and Applications · Advanced Control Systems Optimization

MethodsAttention Is All You Need · Byte Pair Encoding · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings · Softmax · Layer Normalization