How do Transformer Embeddings Represent Compositions? A Functional Analysis

Aishik Nagar; Ishaan Singh Rawal; Mansi Dhanania; Cheston Tan

arXiv:2506.00914·cs.CL·June 3, 2025

How do Transformer Embeddings Represent Compositions? A Functional Analysis

Aishik Nagar, Ishaan Singh Rawal, Mansi Dhanania, Cheston Tan

PDF

Open Access

TL;DR

This paper investigates how transformer-based language models represent compositional structures, revealing that most models are highly compositional, with BERT showing weaker compositionality, through a comprehensive analysis of various models and methods.

Contribution

It provides a comparative analysis of compositionality in different transformer models, highlighting the effectiveness of simple models like addition and regression in capturing compositional representations.

Findings

01

Ridge regression best models compositionality among tested methods.

02

Most embedding models are highly compositional, except BERT.

03

Vector addition performs nearly as well as more complex models.

Abstract

Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent compound words, and whether these representations are compositional. In this study, we test compositionality in Mistral, OpenAI Large, and Google embedding models, and compare them with BERT. First, we evaluate compositionality in the representations by examining six diverse models of compositionality (addition, multiplication, dilation, regression, etc.). We find that ridge regression, albeit linear, best accounts for compositionality. Surprisingly, we find that the classic vector addition model performs almost as well as any other model. Next, we verify that most embedding models are highly compositional, while BERT shows much poorer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution