TL;DR
STATe introduces a structured, interpretable approach to improve diversity, control, and understanding in large language model reasoning by using high-level action templates instead of stochastic sampling.
Contribution
The paper proposes STATe, a novel method replacing stochastic sampling with discrete, interpretable actions to enhance diversity, control, and interpretability in reasoning with language models.
Findings
STATe achieves greater response diversity than temperature sampling.
Explicit action sequences are highly predictive of output quality.
Estimating action-performance associations guides generation toward promising regions.
Abstract
Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their interpretability. We present STATe Of Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices; a generator produces reasoning steps conditioned on those choices; and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions reliably influence LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
