LLM-First Search: Self-Guided Exploration of the Solution Space
Nathan Herr, Tim Rockt\"aschel, Roberta Raileanu

TL;DR
This paper introduces LLM-First Search (LFS), a self-guided exploration method that enables LLMs to autonomously control their search process, improving reasoning efficiency and adaptability without manual tuning.
Contribution
The paper proposes LLM-First Search, a novel self-guided search approach allowing LLMs to autonomously navigate solution spaces, eliminating the need for fixed heuristics or external policies.
Findings
LFS outperforms classic search algorithms on challenging tasks.
LFS is more computationally efficient, especially with stronger models.
LFS scales better with increased compute and model size.
Abstract
Large Language Models (LLMs) have demonstrated remarkable improvements in reasoning and planning through increased test-time compute, often by framing problem-solving as a search process. While methods like Monte Carlo Tree Search (MCTS) have proven effective in some domains, their reliance on fixed exploration hyperparameters limits their adaptability across tasks of varying difficulty, rendering them impractical or expensive in certain settings. In this paper, we propose \textbf{LLM-First Search (LFS)}, a novel \textit{LLM Self-Guided Search} method that removes the need for pre-defined search strategies by empowering the LLM to autonomously control the search process via self-guided exploration. Rather than relying on external heuristics or hardcoded policies, the LLM evaluates whether to pursue the current search path or explore alternative branches based on its internal scoring…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Choosing win rate to accommodate generation stochasticity is sensible. The paper also reports Wilson 95% CIs, efficiency (wins per token), and performance profiles (AUP), which improves statistical transparency.
1. Countdown and Sudoku are rigorous but synthetic; standing alone (without math/coding or other reasoning tasks) they have relatively limited action spaces, which narrows external validity. The authors themselves note the restricted scope. However, given that it is a paper that proposes a methodology, the limitation is substantial. 2. Limited ablations: there is no analysis of LFS internals to understand the effectiveness of the proposed pipeline. Component-wise ablations would help isolat
1. Well-Motivated Problem: The paper clearly identifies a significant limitation in existing search-augmented LLMs. It highlights the impracticality and sub-optimality of relying on fixed hyperparameters (like the MCTS exploration constant $C$), which require costly re-tuning for different tasks or models. The finding that MCTS performance can degrade when using a stronger model provides a compelling motivation for a more adaptive approach. 2. Strong Empirical Performance: Within the confines
1. Concerns Regarding Novelty: The proposed method appears to have significant overlap with the Tree-of-Thoughts (ToT) framework. The core mechanism—using an 'Evaluate' prompt for scoring and an 'Explore' prompt for decision-making, coupled with a priority queue—could be interpreted as a sophisticated form of prompt engineering built upon the ToT concept, rather than a fundamentally new search paradigm. The novelty beyond this implementation is not made sufficiently clear. 2. Insufficient An
1. The idea is simple, intuitive and clearly conveyed: to remove handcrafted rule for exploration-exploitation and let LLM take charge of which node to expand during tree search. 2. The related work seems to be rather complete, which discusses different LLM test-time reasoning framework in details and clearly stated the difference between the proposed method and prior works. 3. The paper has a very detailed appendix, which greatly increases the reproducibility of the paper by providing all th
1. As mentioned in the limitation section, the proposed method highly relies on the LLM's base ability as the exploration-exploitation tradeoff is made by the LLM itself. However, if future long-context, thinking state-of-the-art models are strong enough, they may inherently possess the ability to jump between different branches of thoughts without adopting LFS (e.g. many papers [1] mention crucial tokens such as "wait" or "however" in solving complicated math problems, which can be seen as a va
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Artificial Intelligence in Games
