On Isotropy Calibration of Transformers
Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger, Wattenhofer

TL;DR
This paper empirically evaluates methods for isotropy calibration in transformer embeddings, finding that such methods do not consistently improve performance, supporting the idea that transformers are already locally isotropic.
Contribution
The study provides a comprehensive empirical assessment of existing isotropy calibration techniques on transformers, challenging their effectiveness given local isotropy.
Findings
Calibration methods do not consistently improve transformer performance.
Transformers exhibit local isotropy in their embedding space.
Additional isotropy calibration may be unnecessary for transformers.
Abstract
Different studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone. Meanwhile, static word representations (e.g., Word2Vec or GloVe) have been shown to benefit from isotropic spaces. Therefore, previous work has developed methods to calibrate the embedding space of transformers in order to ensure isotropy. However, a recent study (Cai et al. 2021) shows that the embedding space of transformers is locally isotropic, which suggests that these models are already capable of exploiting the expressive capacity of their embedding space. In this work, we conduct an empirical evaluation of state-of-the-art methods for isotropy calibration on transformers and find that they do not provide consistent improvements across models and tasks. These results support the thesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
