Variational Neural Machine Translation with Normalizing Flows

Hendra Setiawan; Matthias Sperber; Udhay Nallasamy; Matthias Paulik

arXiv:2005.13978·cs.CL·May 29, 2020

Variational Neural Machine Translation with Normalizing Flows

Hendra Setiawan, Matthias Sperber, Udhay Nallasamy, Matthias Paulik

PDF

TL;DR

This paper introduces a novel Variational Neural Machine Translation framework using normalizing flows to model latent variables more effectively within Transformer models, leading to improved translation accuracy.

Contribution

It extends VNMT to Transformer architectures with a flexible normalizing flow-based posterior, enhancing latent variable modeling in neural machine translation.

Findings

01

Significant performance improvements over baselines

02

Effective in both in-domain and out-of-domain translation tasks

03

Enhanced latent variable utilization in Transformer models

Abstract

Variational Neural Machine Translation (VNMT) is an attractive framework for modeling the generation of target translations, conditioned not only on the source sentence but also on some latent random variables. The latent variable modeling may introduce useful statistical dependencies that can improve translation accuracy. Unfortunately, learning informative latent variables is non-trivial, as the latent space can be prohibitively large, and the latent codes are prone to be ignored by many translation models at training time. Previous works impose strong assumptions on the distribution of the latent code and limit the choice of the NMT architecture. In this paper, we propose to apply the VNMT framework to the state-of-the-art Transformer and introduce a more flexible approximate posterior based on normalizing flows. We demonstrate the efficacy of our proposal under both in-domain and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding