Learning Tractable Distributions Of Language Model Continuations
Gwen Yidou-Weng, Ian Li, Anji Liu, Oliver Broadrick, Yuchen Cui, Guy Van den Broeck, and Benjie Wang

TL;DR
The paper introduces LTLA, a hybrid approach combining neural and HMM models to improve tractable language model continuations, enabling better control, syntactic accuracy, and safety with minimal decoding overhead.
Contribution
LTLA is a novel hybrid method that conditions a shared HMM on neural embeddings to improve sequence continuation control and efficiency.
Findings
LTLA outperforms standard HMM surrogates in likelihood.
Enables lookahead control in vision-language models.
Achieves 100% syntactic constraint satisfaction with minimal overhead.
Abstract
Controlled generation imposes sequence-level constraints (syntax, style, safety) that depend on future tokens, making exact conditioning of an autoregressive LM intractable. Tractable surrogates such as HMMs can approximate continuation distributions and steer decoding, but standard surrogates are often weakly context-aware. We propose Learning to Look Ahead (LTLA), a hybrid method that uses base-LM embeddings to condition a globally learned tractable surrogate: a neural head predicts only a prefix-dependent latent prior, while a shared HMM answers continuation queries exactly. LTLA is designed to avoid two common efficiency traps when adding neural context. First, it avoids vocabulary-sized prefix rescoring (V extra LM evaluations) by scoring all next-token candidates via a single batched HMM forward update. Second, it avoids predicting a new HMM per prefix by learning one shared HMM…
Peer Reviews
Decision·Submitted to ICLR 2026
+ Clear motivation: The paper identifies a genuine limitation of prior tractable control methods, poor context sensitivity, and provides a clean, theoretically supported solution by replacing only the encoder with a neural module while keeping tractable inference intact. + Conceptual elegance: The “neural encoder + tractable decoder” design is simple yet effective, offering a principled way to blend neural expressiveness with exact probabilistic reasoning. + Compatibility: LTLA can be plugged in
+ Narrow evaluation scope: All benchmarks involve short sequences (≤32 tokens) from GPT-2 or Qwen2-VL models. The approach has not been tested on longer contexts or more recent/larger LLMs. Do you think it is necessary to evaluate on newer models? Are there specific reasons that prevent applying this approach to more recent architectures? + Limited empirical improvement: The proposed method shows only marginal perplexity gains over standard HMMs (e.g., Fig. 3, left subfigure). The perplexity re
- The work presents an efficient hybrid architecture that combines the deep context-encoding power of a Transformer with the tractable lookahead capability of an HMM. - The method demonstrates significant empirical gains in modeling perplexity and generation quality over prior TPM-based controllers. - The design reuses the LLM's hidden states, ensuring the lookahead process is highly efficient and incurs minimal decoding-time overhead. - The approach shows versatility across diverse tasks, im
- Lookahead's effectiveness diminishes over longer continuation. $z_t$ as described in the paper is limited in capability on how much information it can store. - Expressiveness is limited when using simple HMM structure which ultimately creates a bottleneck. So even though it can be pretty useful for analysis purposes, it cannot be used for practical use cases around generation. - With a large C, and |z_t|, the lookup table can get very large, further reducing practicality.
**Motivation**: I do agree with their assessment on why continuation prediction is an important problem with controllable generation. The distribution for each position should be biased towards tokens that lead to better total sequences, instead of being just position specific. **Solution Novelty**: The proposed solution — a hybrid HMM conditioned on the hidden representation that is able to “look ahead” by cheaply generating a continuation from a fixed position — is quite elegant.
**Experimental Details**: The discussion on experimental details could be more thorough. The appendix does not contain enough information regarding the experimental setup. What was the learning setup for the neural HMM? The learning rate? Batch size, etc? And what about generated samples — why are there no examples in the appendix? As it is, the appendix needs to be significantly strengthened. This is a major weakness, but I also feel that it should be relatively easy to fix with revision. I w
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
