# Regularity Normalization: Neuroscience-Inspired Unsupervised Attention   across Neural Network Layers

**Authors:** Baihan Lin

arXiv: 1902.10658 · 2021-12-30

## TL;DR

Regularity Normalization (RN) is a neuroscience-inspired unsupervised attention mechanism that improves neural network performance across diverse tasks by normalizing statistical regularities in the implicit network space.

## Contribution

The paper introduces RN, a novel unsupervised attention method based on the MDL principle, which enhances neural network robustness and interpretability across multiple domains.

## Key findings

- Outperforms existing normalization methods in various tasks
- Effective in handling limited, imbalanced, and non-stationary data
- Provides insights into neural network layer dependencies and learning stages

## Abstract

Inspired by the adaptation phenomenon of neuronal firing, we propose the regularity normalization (RN) as an unsupervised attention mechanism (UAM) which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle. Treating the neural network optimization process as a partially observable model selection problem, the regularity normalization constrains the implicit space by a normalization factor, the universal code length. We compute this universal code incrementally across neural network layers and demonstrate the flexibility to include data priors such as top-down attention and other oracle information. Empirically, our approach outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution in image classification, classic control, procedurally-generated reinforcement learning, generative modeling, handwriting generation and question answering tasks with various neural network architectures. Lastly, the unsupervised attention mechanisms is a useful probing tool for neural networks by tracking the dependency and critical learning stages across layers and recurrent time steps of deep networks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.10658/full.md

## Figures

178 figures with captions in the complete paper: https://tomesphere.com/paper/1902.10658/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/1902.10658/full.md

---
Source: https://tomesphere.com/paper/1902.10658