Counterbalancing Teacher: Regularizing Batch Normalized Models for   Robustness

Saeid Asgari Taghanaki; Ali Gholami; Fereshte Khani; Kristy Choi; Linh; Tran; Ran Zhang; Aliasghar Khani

arXiv:2207.01548·cs.LG·July 5, 2022

Counterbalancing Teacher: Regularizing Batch Normalized Models for Robustness

Saeid Asgari Taghanaki, Ali Gholami, Fereshte Khani, Kristy Choi, Linh, Tran, Ran Zhang, Aliasghar Khani

PDF

Open Access

TL;DR

This paper identifies a drawback of batch normalization in encouraging reliance on in-domain features, and introduces Counterbalancing Teacher, a regularization method that improves model robustness to out-of-domain data by enforcing consistent representations.

Contribution

The paper reveals the negative impact of batch normalization on out-of-domain generalization and proposes Counterbalancing Teacher, a novel regularization approach using a teacher-student framework to enhance robustness.

Findings

01

Removing BN reduces out-of-domain errors but increases in-domain errors.

02

Counterbalancing Teacher outperforms baselines on robustness benchmarks.

03

Theoretical analysis explains normalization's influence on feature reliance.

Abstract

Batch normalization (BN) is a ubiquitous technique for training deep neural networks that accelerates their convergence to reach higher accuracy. However, we demonstrate that BN comes with a fundamental drawback: it incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data, hurting generalization performance on out-of-domain examples. In this work, we investigate this phenomenon by first showing that removing BN layers across a wide range of architectures leads to lower out-of-domain and corruption errors at the cost of higher in-domain errors. We then propose Counterbalancing Teacher (CT), a method which leverages a frozen copy of the same model without BN as a teacher to enforce the student network's learning of robust representations by substantially adapting its weights through a consistency loss function. This regularization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Seismic Imaging and Inversion Techniques · Advanced Neural Network Applications

MethodsLinear Regression