Training Faster by Separating Modes of Variation in Batch-normalized   Models

Mahdi M. Kalayeh; Mubarak Shah

arXiv:1806.02892·cs.LG·November 16, 2018

Training Faster by Separating Modes of Variation in Batch-normalized Models

Mahdi M. Kalayeh, Mubarak Shah

PDF

1 Repo

TL;DR

This paper introduces Mixture Normalization, a novel approach that improves Batch Normalization by modeling data distributions as Gaussian mixtures, leading to faster training and higher quality models in CNNs.

Contribution

The paper proposes Mixture Normalization, which enhances Batch Normalization by accounting for data heterogeneity through Gaussian mixture models, improving training speed and model quality.

Findings

01

Mixture Normalization accelerates training in CNNs.

02

MN achieves higher accuracy on CIFAR-10 and CIFAR-100.

03

MN improves training of GANs.

Abstract

Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes inputs to the layers during training using the statistics of each mini-batch. In this work, we study BN from the viewpoint of Fisher kernels. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means BN can be explained in terms of kernels that naturally emerge from the probability density function of the underlying data distribution. However, given the rectifying non-linearities employed in CNN architectures, distribution of inputs to the layers show heavy tail and asymmetric characteristics. Therefore, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

b-faye/unsupervised-context-normalization
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixture Normalization