How do Transformer Embeddings Represent Compositions? A Functional Analysis
Aishik Nagar, Ishaan Singh Rawal, Mansi Dhanania, Cheston Tan

TL;DR
This paper investigates how transformer-based language models represent compositional structures, revealing that most models are highly compositional, with BERT showing weaker compositionality, through a comprehensive analysis of various models and methods.
Contribution
It provides a comparative analysis of compositionality in different transformer models, highlighting the effectiveness of simple models like addition and regression in capturing compositional representations.
Findings
Ridge regression best models compositionality among tested methods.
Most embedding models are highly compositional, except BERT.
Vector addition performs nearly as well as more complex models.
Abstract
Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent compound words, and whether these representations are compositional. In this study, we test compositionality in Mistral, OpenAI Large, and Google embedding models, and compare them with BERT. First, we evaluate compositionality in the representations by examining six diverse models of compositionality (addition, multiplication, dilation, regression, etc.). We find that ridge regression, albeit linear, best accounts for compositionality. Surprisingly, we find that the classic vector addition model performs almost as well as any other model. Next, we verify that most embedding models are highly compositional, while BERT shows much poorer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
