Representation Collapse in Machine Translation Through the Lens of Angular Dispersion
Evgeniia Tokarchuk, Maya K. Nachesa, Sergey Troshin, Vlad Niculae

TL;DR
This paper investigates the phenomenon of representation collapse in neural machine translation models, analyzing its dynamics and proposing angular dispersion regularization to mitigate it and enhance translation quality.
Contribution
It introduces an analysis of representation collapse in NMT models and demonstrates that angular dispersion regularization effectively mitigates collapse and improves translation performance.
Findings
Regularization based on angular dispersion reduces representation collapse.
Quantized models exhibit similar collapse behavior as full models.
Regularization benefits persist even after model quantization.
Abstract
Modern neural translation models based on the Transformer architecture are known for their high performance, particularly when trained on high-resource datasets. A standard next-token prediction training strategy, while widely adopted in practice, may lead to overlooked artifacts such as representation collapse. Previous works have shown that this problem is especially pronounced in the representation of the deeper Transformer layers, where it often fails to efficiently utilize the geometric space. Representation collapse is even more evident in end-to-end training of continuous-output neural machine translation, where the trivial solution would be to set all vectors to the same value. In this work, we analyze the dynamics of representation collapse at different levels of discrete and continuous NMT transformers throughout training. We incorporate an existing regularization method based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Generative Adversarial Networks and Image Synthesis
