Mixture Models for Diverse Machine Translation: Tricks of the Trade
Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

TL;DR
This paper explores the application of mixture models in machine translation, identifying key training tricks and design choices that improve their robustness and ability to generate diverse, high-quality translations.
Contribution
It provides an extensive empirical study of mixture model variants, revealing effective training techniques and design considerations for diverse machine translation.
Findings
Disabling dropout noise in responsibility computation improves training stability.
Certain mixture model configurations outperform variational and diverse decoding methods.
The developed evaluation protocol effectively measures translation quality and diversity.
Abstract
Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixture models are prone to degeneracies---often only one component gets trained or the latent variable is simply ignored. We find that disabling dropout noise in responsibility computation is critical to successful training. In addition, the design choices of parameterization, prior distribution, hard versus soft EM and online versus offline assignment can dramatically affect model performance. We develop an evaluation protocol to assess both quality and diversity of generations against multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
