TL;DR
This paper introduces a novel normalization layer called normality normalization that encourages neural network activations to follow a Gaussian distribution, leveraging information-theoretic properties to improve training, generalization, and robustness.
Contribution
It proposes a new normalization method using the power transform and Gaussian noise to promote Gaussian-like activations in neural networks.
Findings
Enhances generalization across multiple models and datasets.
Improves robustness to random perturbations.
Compatible with existing normalization layers.
Abstract
The normal distribution plays a central role in information theory - it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
