Tree Search for Language Model Agents

Jing Yu Koh; Stephen McAleer; Daniel Fried; Ruslan Salakhutdinov

arXiv:2407.01476·cs.AI·February 10, 2026·1 cites

Tree Search for Language Model Agents

Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel tree search algorithm for language model agents that enhances multi-step reasoning and planning in web environments, significantly improving success rates on benchmark tasks.

Contribution

It presents the first effective tree search method for LM agents operating in realistic web tasks, demonstrating substantial performance gains.

Findings

01

39.7% relative success rate increase on VisualWebArena

02

28.0% relative improvement on WebArena

03

Performance scales with increased test-time compute

Abstract

Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards addressing this, we propose an inference-time search algorithm for LM agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks. On the challenging VisualWebArena benchmark, applying our search…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1. Experiments are conducted on several practical benchmark datasets such as VisualWebArena and WebArena, showcasing its effectiveness in web-based tasks. 2. The search algorithm is compatible with a variety of LM agents and does not require fine-tuning or retraining. 3. Extensive hyper-parameter analysis is provided.

Weaknesses

1. The search algorithm demands significant computational resources due to the increased number of environment interactions. This may limit its practical applicability in real-time or resource-constrained environments. 2. The success of the best-first search depends heavily on the quality of the value function. Although self-consistency techniques were used, further improvements in the value function are needed for optimal performance. 3. The paper briefly addresses the issue of destructive ac

Reviewer 02Rating 8Confidence 4

Strengths

Originality: This paper proposes an original tree search algorithm for LLM based web agents. The paper also claims to be the first such algorithm. Quality: The paper is of high quality. The results are pretty strong and clear. Clarity: The paper is generally clear and easy to follow. However, there could be some improvement here. Significance: There has been a trend of applying tree search and test time compute across many applications LLMs can be applied to. So it is not surprising to see

Weaknesses

WebArena results do not look strong relative to other modern works. However, some of those works seem to be a bit over-optimized for the benchmark, whereas this work is more general. There are some clarity issues to work out with the writing. The section on destructive actions, seems highly speculative. This is a major weakness of the approach in that in real world settings it is difficult to conduct search with lots of back tracking required. It is not even clear to me how backtracking coul

Reviewer 03Rating 3Confidence 5

Strengths

1. The paper adeptly addresses several critical challenges faced by LLM agents in real-world web environments: the difficulty in obtaining clear rewards, the accumulation of errors, and the complexity of multimodal interactive web environments. The motivation for this research is both meaningful and reasonable. 2. The study introduces a tree search algorithm specifically tailored for LLM-based multi-step planning in web environments. This framework is both clear and straightforward. 3. The autho

Weaknesses

1. Lack of technical contribution. The integration of tree search techniques with LLM planning is not entirely novel, as there is existing research in this area, e.g., [1, 2, 3, 4]. Thus, the contribution of this paper in terms of technique novelty may need to be reconsidered, as it incrementally applies existing tree search-based LLM planning frameworks to the web agent domain. [1] Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models (Released in 6 Oct 2023) [

Code & Models

Repositories

kohjingyu/search-agents
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling