Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee, Thomas Jiralerspong, Lei Yu, Yoshua Bengio, Emily Cheng

TL;DR
This paper investigates how language models encode linguistic compositionality by analyzing the intrinsic dimension of their representations, revealing that compositionality correlates with geometric complexity and differs between semantic and superficial features.
Contribution
It introduces a geometric framework linking compositionality to intrinsic dimension in language model representations and distinguishes between semantic and superficial encoding.
Findings
Dataset compositionality correlates with intrinsic dimension of representations.
Learned linguistic features influence the relationship between compositionality and geometric complexity.
Semantic aspects are encoded in nonlinear dimensionality, superficial aspects in linear dimensionality.
Abstract
By virtue of linguistic compositionality, few syntactic rules and a finite lexicon can generate an unbounded number of sentences. That is, language, though seemingly high-dimensional, can be explained using relatively few degrees of freedom. An open question is whether contemporary language models (LMs) reflect the intrinsic simplicity of language that is enabled by compositionality. We take a geometric view of this problem by relating the degree of compositionality in a dataset to the intrinsic dimension (ID) of its representations under an LM, a measure of feature complexity. We find not only that the degree of dataset compositionality is reflected in representations' ID, but that the relationship between compositionality and geometric complexity arises due to learned linguistic features over training. Finally, our analyses reveal a striking contrast between nonlinear and linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSyntax, Semantics, Linguistic Variation
