Document-level Neural Machine Translation with Document Embeddings

Shu Jiang; Hai Zhao; Zuchao Li; Bao-Liang Lu

arXiv:2009.08775·cs.CL·October 13, 2021

Document-level Neural Machine Translation with Document Embeddings

Shu Jiang, Hai Zhao, Zuchao Li, Bao-Liang Lu

PDF

TL;DR

This paper introduces a document-aware neural machine translation method that leverages multiple forms of document embeddings to incorporate detailed context, significantly improving translation quality over existing models.

Contribution

It proposes a novel approach to utilize both global and local document embeddings in Transformer-based NMT, enhancing context modeling beyond previous methods.

Findings

01

Significant performance improvements over strong baselines

02

Effective modeling of deeper document-level context

03

Enhanced translation quality with document embeddings

Abstract

Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods are satisfied with a smattering sense of brief document-level information, while this work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings, which is capable of sufficiently modeling deeper and richer document-level context. The proposed document-aware NMT is implemented to enhance the Transformer baseline by introducing both global and local document-level clues on the source end. Experiments show that the proposed method significantly improves the translation performance over strong baselines and other related studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Dropout · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Attention Is All You Need