Spontaneous Emerging Preference in Two-tower Language Model
Zhengqi He, Taro Toyoizumi

TL;DR
This paper investigates whether natural language processes can be inherently divided by training two identical language models side-by-side, revealing a spontaneous preference phenomenon that could inform future NLP development.
Contribution
It introduces a two-tower language model framework and discovers a stable, intrinsic property of natural language where tokens are preferentially predicted by different towers.
Findings
Tokens show consistent preference for one tower over the other.
The preference phenomenon is stable across different model configurations.
This suggests an intrinsic property of natural language.
Abstract
The ever-growing size of the foundation language model has brought significant performance gains in various types of downstream tasks. With the existence of side-effects brought about by the large size of the foundation language model such as deployment cost, availability issues, and environmental cost, there is some interest in exploring other possible directions, such as a divide-and-conquer scheme. In this paper, we are asking a basic question: are language processes naturally dividable? We study this problem with a simple two-tower language model setting, where two language models with identical configurations are trained side-by-side cooperatively. With this setting, we discover the spontaneous emerging preference phenomenon, where some of the tokens are consistently better predicted by one tower while others by another tower. This phenomenon is qualitatively stable, regardless of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multi-Criteria Decision Making · Speech and dialogue systems
