Modeling Context With Linear Attention for Scalable Document-Level   Translation

Zhaofeng Wu; Hao Peng; Nikolaos Pappas; Noah A. Smith

arXiv:2210.08431·cs.CL·October 18, 2022·1 cites

Modeling Context With Linear Attention for Scalable Document-Level Translation

Zhaofeng Wu, Hao Peng, Nikolaos Pappas, Noah A. Smith

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper explores the use of a linear attention model with sentential gating for scalable document-level translation, achieving faster decoding and improved quality over traditional transformers.

Contribution

It demonstrates that linear attention with gating enhances scalability and translation quality, addressing the quadratic complexity issue in long document translation.

Findings

01

Significantly faster decoding on long sequences.

02

Comparable or improved BLEU scores.

03

Sentential gating improves translation quality.

Abstract

Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations. However, these models, predominantly based on transformers, are difficult to scale to long documents as their attention layers have quadratic complexity in the sequence length. Recent efforts on efficient attention improve scalability, but their effect on document translation remains unexplored. In this work, we investigate the efficacy of a recent linear attention model by Peng et al. (2021) on document translation and augment it with a sentential gate to promote a recency inductive bias. We evaluate the model on IWSLT 2015 and OpenSubtitles 2018 against the transformer, demonstrating substantially increased decoding speed on long sequences with similar or better BLEU scores. We show that sentential gating further improves translation quality on IWSLT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaofengwu/rfa-doc-mt
pytorchOfficial

Models

🤗
ZhaofengWu/rfa-doc-mt-models
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings