Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model
Zhiyu An, Wan Du

TL;DR
This paper introduces Homomorphism Error (HE), a structural metric that predicts and enhances compositional generalization in Transformer language models by aligning their learned representations with linguistic rules.
Contribution
The paper proposes HE as a novel metric for understanding and improving compositional generalization, demonstrating its predictive power and effectiveness as a regularizer in training.
Findings
HE predicts out-of-distribution compositional generalization with high correlation (R^2=0.73).
HE-regularized training significantly reduces HE and improves OOD accuracy.
Model depth has minimal impact on HE and OOD performance.
Abstract
Compositional generalization-the ability to interpret novel combinations of familiar components-remains a persistent challenge for neural networks. Behavioral evaluations reveal \emph{when} models fail but offer limited insight into \emph{why} failures arise at the representational level. We introduce \textit{Homomorphism Error} (HE), a structural metric that measures the inconsistency between a set of established rules for which words combine to form new meaning (linguistic syntax) and model's learned rules for which hidden states combine to form new states (semantic syntax). We formulate this inconsistency as deviations from approximate homomorphisms between the linguistic expression algebra and a model's hidden-state space. We designed experiments to test if i) HE predicts compositional generalization performance, and ii) will regularizing for low HE during training improve such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices · Topic Modeling
