Measuring Compositionality in Representation Learning
Jacob Andreas

TL;DR
This paper introduces a method to quantify how well learned representations in machine learning models reflect the compositional structure of inputs, bridging linguistic concepts with vector-based representations.
Contribution
It proposes a general procedure for measuring compositionality in vector representations by comparing true models to compositional approximations, with formal and empirical analysis.
Findings
The method effectively quantifies compositionality in various models.
Higher compositionality correlates with better generalization.
Representational primitives can be inferred to understand structure.
Abstract
Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this compositional structure is reflected in the the inputs' learned representations. While the assessment of compositionality in languages has received significant attention in linguistics and adjacent fields, the machine learning literature lacks general-purpose tools for producing graded measurements of compositional structure in more general (e.g. vector-valued) representation spaces. We describe a procedure for evaluating compositionality by measuring how well the true representation-producing model can be approximated by a model that explicitly composes a collection of inferred representational primitives. We use the procedure to provide formal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
