Only 5\% Attention Is All You Need: Efficient Long-range Document-level   Neural Machine Translation

Zihan Liu; Zewei Sun; Shanbo Cheng; Shujian Huang; Mingxuan Wang

arXiv:2309.14174·cs.CL·September 26, 2023

Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation

Zihan Liu, Zewei Sun, Shanbo Cheng, Shujian Huang, Mingxuan Wang

PDF

Open Access

TL;DR

This paper introduces a lightweight attention mechanism that reduces the tokens attended to in Transformer-based document translation, achieving significant speedups and sparsity without sacrificing translation quality.

Contribution

It proposes a novel sparse attention method that maintains performance while reducing computational complexity by attending only 5 extpercent of tokens.

Findings

01

Achieves up to 95 extpercent sparsity in attention

02

Saves 93 extpercent of attention computation cost

03

Maintains translation quality with speed improvements

Abstract

Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse phenomena by introducing document-level context information. One of the most important directions is to input the whole document directly to the standard Transformer model. In this case, efficiency becomes a critical concern due to the quadratic complexity of the attention module. Existing studies either focus on the encoder part, which cannot be deployed on sequence-to-sequence generation tasks, e.g., Machine Translation (MT), or suffer from a significant performance drop. In this work, we keep the translation performance while gaining 20\% speed up by introducing extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. It takes advantage of the original attention to ensure performance and dimension reduction to accelerate inference.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Focus · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Linear Layer