Rethinking "Batch" in BatchNorm

Yuxin Wu; Justin Johnson

arXiv:2105.07576·cs.CV·May 18, 2021·25 cites

Rethinking "Batch" in BatchNorm

Yuxin Wu, Justin Johnson

PDF

Open Access 1 Repo

TL;DR

This paper critically examines BatchNorm's reliance on batch-based operations, revealing hidden issues affecting model performance, and proposes rethinking batch concepts to improve effectiveness in visual recognition tasks.

Contribution

It provides a thorough review of BatchNorm's hidden caveats and suggests new perspectives on defining 'batch' to mitigate these issues.

Findings

01

Identifies subtle performance issues caused by BatchNorm's batch dependence

02

Proposes alternative batch definitions to address caveats

03

Offers practical guidelines for more effective BatchNorm usage

Abstract

BatchNorm is a critical building block in modern convolutional neural networks. Its unique property of operating on "batches" instead of individual samples introduces significantly different behaviors from most other operations in deep learning. As a result, it leads to many hidden caveats that can negatively impact model's performance in subtle ways. This paper thoroughly reviews such problems in visual recognition tasks, and shows that a key to address them is to rethink different choices in the concept of "batch" in BatchNorm. By presenting these caveats and their mitigations, we hope this review can help researchers use BatchNorm more effectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/detectron2/tree/master/projects/Rethinking-BatchNorm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning