Language models struggle with compartmentalization
Thomas Vincent Howe, David Wingate

TL;DR
Large language models often fail to unify different representations of the same concept, leading to compartmentalization that reduces efficiency and capacity utilization, especially in multilingual settings.
Contribution
This paper introduces the concept of compartmentalization in LLMs, demonstrating its impact and limitations through experiments with synthetic data and multilingual training.
Findings
LLMs can develop redundant internal representations of the same concept.
Synthetic parallel data may not effectively reduce compartmentalization.
Early multilingual training is nearly fully compartmentalized in small models.
Abstract
In the training data used by large language models (LLMs), the same latent concept is often presented in multiple distinct ways: the same facts appear in English and Swahili; many functions can be expressed in both Python and Haskell; we can express propositions in both formal and natural language. We show that LLMs can exhibit compartmentalization, where they fail to identify and share statistical strength between distinct presentations of unified concepts. In the worst case, LLMs simply learn parallel internal representations of each presentation of the concept, saturating model capacity with redundancies and decreasing sample efficiency with the number of such presentations. We also demonstrate that synthetic parallel data can fail to improve this despite being easily learned itself. Under this framework, we find that, for small models, early multilingual learning is nearly entirely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
