Learning to Compose Representations of Different Encoder Layers towards   Improving Compositional Generalization

Lei Lin; Shuangtao Li; Yafang Zheng; Biao Fu; Shan Liu; Yidong Chen,; Xiaodong Shi

arXiv:2305.12169·cs.CL·October 19, 2023·1 cites

Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

Lei Lin, Shuangtao Li, Yafang Zheng, Biao Fu, Shan Liu, Yidong Chen,, Xiaodong Shi

PDF

Open Access

TL;DR

This paper introduces CompoSition, a method that dynamically combines different encoder layer representations to improve compositional generalization in seq2seq models, addressing entanglement issues.

Contribution

It proposes a novel approach to compose encoder layer representations for better generalization, extending seq2seq models to handle syntactic and semantic entanglement.

Findings

01

Achieves competitive results on two benchmarks.

02

Effectively reduces representation entanglement.

03

Improves compositional generalization performance.

Abstract

Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Residual Connection · Absolute Position Encodings · Layer Normalization · Softmax · Adam · Byte Pair Encoding