MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Mara Finkelstein, Subhajit Naskar, Mehdi Mirzazadeh, Apurva Shah, and, Markus Freitag

TL;DR
This paper introduces finetuning methods that incorporate the benefits of expensive decoding techniques like MBR and QE during training, enabling high-quality NLG outputs with efficient decoding at inference time.
Contribution
It proposes MBR and QE finetuning approaches that distill decoding quality improvements into training, outperforming base models and even human references when using external teachers.
Findings
Finetuning with MBR and QE surpasses base model performance.
External LLM teachers lead to better results than human references.
Methods achieve high-quality outputs with efficient decoding.
Abstract
Recent research in decoding methods for Natural Language Generation (NLG) tasks has shown that MAP decoding is not optimal, because model probabilities do not always align with human preferences. Stronger decoding methods, including Quality Estimation (QE) reranking and Minimum Bayes' Risk (MBR) decoding, have since been proposed to mitigate the model-perplexity-vs-quality mismatch. While these decoding methods achieve state-of-the-art performance, they are prohibitively expensive to compute. In this work, we propose MBR finetuning and QE finetuning which distill the quality gains from these decoding methods at training time, while using an efficient decoding algorithm at inference time. Using the canonical NLG task of Neural Machine Translation (NMT), we show that even with self-training, these finetuning methods significantly outperform the base model. Moreover, when using an external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsBalanced Selection · ALIGN
