TL;DR
This paper introduces Mixture Normalization, a novel approach that improves Batch Normalization by modeling data distributions as Gaussian mixtures, leading to faster training and higher quality models in CNNs.
Contribution
The paper proposes Mixture Normalization, which enhances Batch Normalization by accounting for data heterogeneity through Gaussian mixture models, improving training speed and model quality.
Findings
Mixture Normalization accelerates training in CNNs.
MN achieves higher accuracy on CIFAR-10 and CIFAR-100.
MN improves training of GANs.
Abstract
Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes inputs to the layers during training using the statistics of each mini-batch. In this work, we study BN from the viewpoint of Fisher kernels. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means BN can be explained in terms of kernels that naturally emerge from the probability density function of the underlying data distribution. However, given the rectifying non-linearities employed in CNN architectures, distribution of inputs to the layers show heavy tail and asymmetric characteristics. Therefore, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture Normalization
