Batch Normalization and the impact of batch structure on the behavior of deep convolution networks
Mohamed Hajaj, Duncan Gillies

TL;DR
This paper investigates how the structure of training and test batches, specifically balanced batches with one image per class, influences the effectiveness of batch normalization in deep convolutional networks, leading to near-zero error rates on small datasets.
Contribution
It demonstrates that controlling batch structure, especially balanced batches, significantly improves batch normalization performance and network accuracy, revealing a new aspect of batch normalization behavior.
Findings
Balanced batch structure reduces error rates significantly.
Using batch-specific means and variances improves learning.
Near-zero error achieved on CIFAR10 with structured batches.
Abstract
Batch normalization was introduced in 2015 to speed up training of deep convolution networks by normalizing the activations across the current batch to have zero mean and unity variance. The results presented here show an interesting aspect of batch normalization, where controlling the shape of the training batches can influence what the network will learn. If training batches are structured as balanced batches (one image per class), and inference is also carried out on balanced test batches, using the batch's own means and variances, then the conditional results will improve considerably. The network uses the strong information about easy images in a balanced batch, and propagates it through the shared means and variances to help decide the identity of harder images on the same batch. Balancing the test batches requires the labels of the test images, which are not available in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
