On Separate Normalization in Self-supervised Transformers
Xiaohui Chen, Yinkai Wang, Yuanqi Du, Soha Hassoun, Li-Ping Liu

TL;DR
This paper introduces a simple yet effective modification to self-supervised transformer models by using separate normalization layers for tokens and the [CLS] symbol, leading to improved performance across multiple domains.
Contribution
It proposes a novel approach of employing separate normalization layers for tokens and [CLS], which enhances the encoding of global information and improves downstream task results.
Findings
2.7% average performance improvement across domains
Better encoding of global context in [CLS] embeddings
More uniform distribution of [CLS] embeddings
Abstract
Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Neural Networks and Applications
