Bird-Eye Transformers for Text Generation Models

Lei Sha; Yuhang Song; Yordan Yordanov; Tommaso Salvatori; Thomas; Lukasiewicz

arXiv:2210.03985·cs.CL·October 11, 2022

Bird-Eye Transformers for Text Generation Models

Lei Sha, Yuhang Song, Yordan Yordanov, Tommaso Salvatori, Thomas, Lukasiewicz

PDF

Open Access 1 Repo

TL;DR

This paper introduces the bird-eye transformer (BET), a novel architecture that enhances text generation by reweighting self-attention to better incorporate historical information, outperforming baseline transformers across multiple datasets.

Contribution

The paper proposes BET, a new transformer variant that improves historical information utilization in text generation tasks, addressing limitations of standard self-attention.

Findings

01

BET outperforms baseline transformers on all tested datasets.

02

Reweighting self-attention improves historical context integration.

03

Experimental results demonstrate superior performance in machine translation and language modeling.

Abstract

Transformers have become an indispensable module for text generation models since their great success in machine translation. Previous works attribute the~success of transformers to the query-key-value dot-product attention, which provides a robust inductive bias by the fully connected token graphs. However, we found that self-attention has a severe limitation. When predicting the (i+1)-th token, self-attention only takes the i-th token as an information collector, and it tends to give a high attention weight to those tokens similar to itself. Therefore, most of the historical information that occurred before the i-th token is not taken into consideration. Based on this observation, in this paper, we propose a new architecture, called bird-eye transformer(BET), which goes one step further to improve the performance of transformers by reweighting self-attention to encourage it to focus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-jku/hopfield-layers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques