Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin, Shuangtao Li, Yafang Zheng, Biao Fu, Shan Liu, Yidong Chen,, Xiaodong Shi

TL;DR
This paper introduces CompoSition, a method that dynamically combines different encoder layer representations to improve compositional generalization in seq2seq models, addressing entanglement issues.
Contribution
It proposes a novel approach to compose encoder layer representations for better generalization, extending seq2seq models to handle syntactic and semantic entanglement.
Findings
Achieves competitive results on two benchmarks.
Effectively reduces representation entanglement.
Improves compositional generalization performance.
Abstract
Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Residual Connection · Absolute Position Encodings · Layer Normalization · Softmax · Adam · Byte Pair Encoding
