Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Xueyan Li; Johannes Zenn; Ekaterina Fadeeva; Guinan Su; Mrinmaya Sachan; Jonas Geiping

arXiv:2604.20500·cs.LG·April 23, 2026

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

Xueyan Li, Johannes Zenn, Ekaterina Fadeeva, Guinan Su, Mrinmaya Sachan, Jonas Geiping

PDF

TL;DR

This paper introduces DLE, a deterministic decoding method that efficiently explores high-quality reasoning traces by systematically enumerating distinct leaves, improving inference in math, coding, and reasoning tasks.

Contribution

DLE offers a novel deterministic traversal of decoding trees, enhancing inference efficiency and reasoning trace quality over stochastic sampling methods.

Findings

01

DLE explores higher-quality reasoning traces than stochastic methods.

02

DLE improves inference efficiency by reusing shared prefixes and reducing redundancy.

03

Empirically, DLE yields better performance on math, coding, and reasoning tasks.

Abstract

Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.