Decoding-Free Sampling Strategies for LLM Marginalization

David Pohl; Marco Cognetta; Junyoung Lee; Naoaki Okazaki

arXiv:2510.20208·cs.CL·October 24, 2025

Decoding-Free Sampling Strategies for LLM Marginalization

David Pohl, Marco Cognetta, Junyoung Lee, Naoaki Okazaki

PDF

Open Access

TL;DR

This paper introduces decoding-free sampling strategies for estimating the probability of text under language models by marginalization, significantly reducing computational costs while maintaining accuracy, and demonstrating their effectiveness on various tasks.

Contribution

It proposes decoding-free sampling methods for marginalization in language models, eliminating the need for costly generation steps and improving efficiency.

Findings

01

Decoding-free strategies achieve accurate marginal estimates.

02

These methods are faster and more cost-effective than traditional sampling.

03

Effective on multiple open language models and downstream tasks.

Abstract

Modern language models operate on subword-tokenized text in order to make a trade-off between model size, inference speed, and vocabulary coverage. A side effect of this is that, during inference, models are evaluated by measuring the probability of only the specific tokenization produced as the output, despite there being many possible ways to represent the same text with a subword vocabulary. Recent studies have argued instead for evaluating LLMs by marginalization - the probability mass of all tokenizations of a given text. Marginalization is difficult due to the number of possible tokenizations of a text, so often approximate marginalization is done via sampling. However, a downside of sampling is that an expensive generation step must be performed by the LLM for each sample, which limits the number of samples that can be acquired given a runtime budget, and therefore also the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification