Bidirectional Awareness Induction in Autoregressive Seq2Seq Models

Jia Cheng Hu; Roberto Cavicchioli; Alessandro Capotondi

arXiv:2408.13959·cs.CL·August 27, 2024

Bidirectional Awareness Induction in Autoregressive Seq2Seq Models

Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

PDF

Open Access

TL;DR

This paper introduces Bidirectional Awareness Induction (BAI), a training method that enhances autoregressive seq2seq models by enabling bidirectional learning through Pivots, improving performance across multiple architectures and tasks.

Contribution

The paper presents BAI, a novel training approach that allows bidirectional learning in autoregressive models without architectural changes, applicable to various architectures and pre-trained models.

Findings

01

Up to 2.4 CIDEr improvement in Image-Captioning

02

Up to 4.96 BLEU increase in Neural Machine Translation

03

Up to 1.16 ROUGE boost in Text Summarization

Abstract

Autoregressive Sequence-To-Sequence models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. Despite that, they still present significant limitations. For instance, when errors occur in the early steps of the prediction, the whole output is severely affected. Such reliance on previously predicted tokens and the inherent computational unfriendliness of sequential algorithms, motivated researchers to explore different architectures and methods in the search for bidirectional approaches. In this work, we introduce the Bidirectional Awareness Induction (BAI), a training method that leverages a subset of elements in the network, the Pivots, to perform bidirectional learning without breaking the autoregressive constraints. To showcase its flexibility, we apply the method to three architectures, the Transformer,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Neural Networks and Applications · Cellular Automata and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Layer Normalization · Weight Decay · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection