TL;DR
Group Normalization (GN) offers a batch size-independent normalization method that maintains accuracy across various batch sizes, outperforming Batch Normalization in small-batch scenarios and effectively transferring to downstream tasks.
Contribution
This paper introduces Group Normalization as a simple, effective alternative to Batch Normalization that is stable across different batch sizes and easily applicable to various computer vision tasks.
Findings
GN reduces error by 10.6% compared to BN at batch size 2.
GN performs comparably or better than BN at typical batch sizes.
GN outperforms BN in object detection, segmentation, and video classification tasks.
Abstract
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems --- BN's error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN's usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Group Normalization (Paper Explained)· youtube
Taxonomy
MethodsRegion Proposal Network · Focal Loss · Feature Pyramid Network · Average Pooling · RetinaNet · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block
