Modeling Concentrated Cross-Attention for Neural Machine Translation   with Gaussian Mixture Model

Shaolei Zhang; Yang Feng

arXiv:2109.05244·cs.CL·September 15, 2021

Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

Shaolei Zhang, Yang Feng

PDF

Open Access

TL;DR

This paper introduces a Gaussian Mixture Model-based approach to model concentrated cross-attention in neural machine translation, improving alignment, accuracy, and handling of long sentences over traditional dot-product attention.

Contribution

It proposes a novel GMM-based concentrated attention mechanism for NMT, addressing limitations of dot-product attention in capturing local relationships.

Findings

01

Improved alignment quality in translation tasks

02

Enhanced N-gram accuracy

03

Better translation of long sentences

Abstract

Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications