Ensembling Language Models with Sequential Monte Carlo
Robin Shing Moon Chan, Tianyu Liu, Samuel Kiegeland, Clemente Pasti, Jacob Hoover Vigly, Timothy J. O'Donnell, Ryan Cotterell, Tim Vieira

TL;DR
This paper introduces a novel sequential Monte Carlo framework for ensembling language models, enabling more effective aggregation of diverse models during decoding and improving structured text generation performance.
Contribution
It proposes a unified SMC-based method for ensembling multiple language models with various aggregation functions, handling mismatched vocabularies and providing better posterior approximations.
Findings
Alternative ensemble strategies outperform probability averaging.
Better posterior approximations lead to improved performance.
The framework supports models with mismatched vocabularies.
Abstract
Practitioners have access to an abundance of language models and prompting strategies for solving many language modeling tasks; yet prior work shows that modeling performance is highly sensitive to both choices. Classical machine learning ensembling techniques offer a principled approach: aggregate predictions from multiple sources to achieve better performance than any single one. However, applying ensembling to language models during decoding is challenging: naively aggregating next-token probabilities yields samples from a locally normalized, biased approximation of the generally intractable ensemble distribution over strings. In this work, we introduce a unified framework for composing language models into -ensemble distributions for a wide range of functions . To sample from these distributions, we propose a byte-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
