Connecting and Comparing Language Model Interpolation Techniques
Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar,, Mirko Hannemann, Youssef Oualil, Ilya Oparin

TL;DR
This paper explores the theoretical connection between count merging and Bayesian interpolation in language models, compares their performance with linear interpolation, and discusses their practical implications.
Contribution
It provides the first comparison of count merging and Bayesian interpolation, demonstrating their similar performance and advocating Bayesian interpolation as the preferred method.
Findings
Count merging and Bayesian interpolation outperform linear interpolation.
Both techniques perform similarly in comparison.
Bayesian interpolation is recommended for most circumstances.
Abstract
In this work, we uncover a theoretical connection between two language model interpolation techniques, count merging and Bayesian interpolation. We compare these techniques as well as linear interpolation in three scenarios with abundant training data per component model. Consistent with prior work, we show that both count merging and Bayesian interpolation outperform linear interpolation. We include the first (to our knowledge) published comparison of count merging and Bayesian interpolation, showing that the two techniques perform similarly. Finally, we argue that other considerations will make Bayesian interpolation the preferred approach in most circumstances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
