On the use of BERT for Neural Machine Translation

St\'ephane Clinchant; Kweon Woo Jung; Vassilina Nikoulina

arXiv:1909.12744·cs.CL·September 30, 2019

On the use of BERT for Neural Machine Translation

St\'ephane Clinchant, Kweon Woo Jung, Vassilina Nikoulina

PDF

TL;DR

This paper investigates how pretrained BERT models can be integrated into neural machine translation systems, analyzing different methods and the impact of monolingual data on translation quality and robustness across various datasets.

Contribution

The study compares multiple BERT integration techniques for NMT and evaluates the influence of monolingual data on translation performance and robustness.

Findings

01

BERT integration improves translation quality on standard and out-of-domain datasets.

02

Monolingual data used for BERT training significantly affects translation robustness.

03

Different integration methods have varying impacts on translation performance.

Abstract

Exploiting large pretrained models for various NMT tasks have gained a lot of visibility recently. In this work we study how BERT pretrained models could be exploited for supervised Neural Machine Translation. We compare various ways to integrate pretrained BERT model with NMT model and study the impact of the monolingual data used for BERT training on the final translation quality. We use WMT-14 English-German, IWSLT15 English-German and IWSLT14 English-Russian datasets for these experiments. In addition to standard task test set evaluation, we perform evaluation on out-of-domain test sets and noise injected test sets, in order to assess how BERT pretrained representations affect model robustness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax