Layer-wise Representation Fusion for Compositional Generalization

Yafang Zheng; Lei Lin; Shuangtao Li; Yuxuan Yuan; Zhaohong Lai; Shan; Liu; Biao Fu; Yidong Chen; Xiaodong Shi

arXiv:2307.10799·cs.CL·December 22, 2023

Layer-wise Representation Fusion for Compositional Generalization

Yafang Zheng, Lei Lin, Shuangtao Li, Yuxuan Yuan, Zhaohong Lai, Shan, Liu, Biao Fu, Yidong Chen, Xiaodong Shi

PDF

Open Access

TL;DR

This paper introduces LRF, a layer-wise fusion framework with fuse-attention modules to improve compositional generalization in neural models by effectively integrating multi-layer information.

Contribution

It identifies the representation entanglement problem in Transformers and proposes a novel fusion method to address it, enhancing CG performance.

Findings

01

LRF outperforms baseline models on benchmark datasets.

02

Fuse-attention modules effectively integrate multi-layer information.

03

Analysis shows improved syntactic and semantic disentanglement.

Abstract

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection