Counterbalancing Teacher: Regularizing Batch Normalized Models for Robustness
Saeid Asgari Taghanaki, Ali Gholami, Fereshte Khani, Kristy Choi, Linh, Tran, Ran Zhang, Aliasghar Khani

TL;DR
This paper identifies a drawback of batch normalization in encouraging reliance on in-domain features, and introduces Counterbalancing Teacher, a regularization method that improves model robustness to out-of-domain data by enforcing consistent representations.
Contribution
The paper reveals the negative impact of batch normalization on out-of-domain generalization and proposes Counterbalancing Teacher, a novel regularization approach using a teacher-student framework to enhance robustness.
Findings
Removing BN reduces out-of-domain errors but increases in-domain errors.
Counterbalancing Teacher outperforms baselines on robustness benchmarks.
Theoretical analysis explains normalization's influence on feature reliance.
Abstract
Batch normalization (BN) is a ubiquitous technique for training deep neural networks that accelerates their convergence to reach higher accuracy. However, we demonstrate that BN comes with a fundamental drawback: it incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data, hurting generalization performance on out-of-domain examples. In this work, we investigate this phenomenon by first showing that removing BN layers across a wide range of architectures leads to lower out-of-domain and corruption errors at the cost of higher in-domain errors. We then propose Counterbalancing Teacher (CT), a method which leverages a frozen copy of the same model without BN as a teacher to enforce the student network's learning of robust representations by substantially adapting its weights through a consistency loss function. This regularization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Seismic Imaging and Inversion Techniques · Advanced Neural Network Applications
MethodsLinear Regression
