Breaking Time Invariance: Assorted-Time Normalization for RNNs
Cole Pospisil, Vasily Zadorozhnyy, Qiang Ye

TL;DR
This paper introduces Assorted-Time Normalization (ATN), a novel RNN normalization technique that incorporates multiple time steps to better capture temporal dependencies without adding trainable parameters.
Contribution
The paper proposes ATN, a normalization method that accounts for multiple time steps in RNNs, enhancing temporal modeling without increasing model complexity.
Findings
ATN improves performance on synthetic tasks like Adding, Copying, and Denoising.
ATN enhances language modeling results across various benchmarks.
Theoretical analysis confirms gradient stability and weight invariance.
Abstract
Methods such as Layer Normalization (LN) and Batch Normalization (BN) have proven to be effective in improving the training of Recurrent Neural Networks (RNNs). However, existing methods normalize using only the instantaneous information at one particular time step, and the result of the normalization is a preactivation state with a time-independent distribution. This implementation fails to account for certain temporal differences inherent in the inputs and the architecture of RNNs. Since these networks share weights across time steps, it may also be desirable to account for the connections between time steps in the normalization scheme. In this paper, we propose a normalization method called Assorted-Time Normalization (ATN), which preserves information from multiple consecutive time steps and normalizes using them. This setup allows us to introduce longer time dependencies into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Natural Language Processing Techniques · Topic Modeling
MethodsBatch Normalization · Layer Normalization
