Connecting and Comparing Language Model Interpolation Techniques

Ernest Pusateri; Christophe Van Gysel; Rami Botros; Sameer Badaskar,; Mirko Hannemann; Youssef Oualil; Ilya Oparin

arXiv:1908.09738·eess.AS·August 27, 2019

Connecting and Comparing Language Model Interpolation Techniques

Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar,, Mirko Hannemann, Youssef Oualil, Ilya Oparin

PDF

TL;DR

This paper explores the theoretical connection between count merging and Bayesian interpolation in language models, compares their performance with linear interpolation, and discusses their practical implications.

Contribution

It provides the first comparison of count merging and Bayesian interpolation, demonstrating their similar performance and advocating Bayesian interpolation as the preferred method.

Findings

01

Count merging and Bayesian interpolation outperform linear interpolation.

02

Both techniques perform similarly in comparison.

03

Bayesian interpolation is recommended for most circumstances.

Abstract

In this work, we uncover a theoretical connection between two language model interpolation techniques, count merging and Bayesian interpolation. We compare these techniques as well as linear interpolation in three scenarios with abundant training data per component model. Consistent with prior work, we show that both count merging and Bayesian interpolation outperform linear interpolation. We include the first (to our knowledge) published comparison of count merging and Bayesian interpolation, showing that the two techniques perform similarly. Finally, we argue that other considerations will make Bayesian interpolation the preferred approach in most circumstances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.