WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment
Mahir Labib Dihan, Tanzima Hashem, Mohammed Eunus Ali, Md Rizwan Parvez

TL;DR
WebOperator introduces a tree-search framework for web agents that enables safe backtracking and strategic exploration, improving long-term planning and error correction in partially observable web environments.
Contribution
It presents a novel tree-search method with safe backtracking and diverse exploration strategies tailored for web agents, addressing limitations of prior greedy approaches.
Findings
Achieves 54.6% success rate on WebArena with GPT-4o.
Demonstrates improved long-term planning and error correction.
Outperforms existing methods in web navigation tasks.
Abstract
LLM-based agents often operate in a greedy, step-by-step manner, selecting actions solely based on the current observation without considering long-term consequences or alternative paths. This lack of foresight is particularly problematic in web environments, which are only partially observable-limited to browser-visible content (e.g., DOM and UI elements)-where a single misstep often requires complex and brittle navigation to undo. Without an explicit backtracking mechanism, agents struggle to correct errors or systematically explore alternative paths. Tree-search methods provide a principled framework for such structured exploration, but existing approaches lack mechanisms for safe backtracking, making them prone to unintended side effects. They also assume that all actions are reversible, ignoring the presence of irreversible actions-limitations that reduce their effectiveness in…
Peer Reviews
Decision·Submitted to ICLR 2026
- The methodological improvements over LM-TS are well-motivated - The evaluation is sufficiently thorough, focussing on WebArena. - The evaluation results are promising achieving significant improvement with an open-source model compared to competitors on larger, commercial models.
- Improvements are iterative compared to LM-TS. - Also provided methods for dealing with destructive actions are heuristics to sort-of shape rewards towards safe actions, so more generalized solutions are still needed. - Safety is a central concept in the paper, but the related works as well as empirical evaluation do not sufficiently focus on this aspect. I think this needs to be improved before acceptance.
S1. Engineering Maturity and Real-World Robustness - The system demonstrates a high level of engineering completeness and stability, functioning reliably in real browser environments (e.g., simulation tabs, URL-based backtracking). - It effectively handles realistic constraints such as partial observability, irreversible actions, and search efficiency, showcasing strong robustness and precision under real-world conditions. S2. Competitive and Generalized Performance - The model achieves consist
W1. Lack of Readability and Coherence - The paper’s writing style and organization are inconsistent, making it difficult to follow the main narrative. - The introduction fails to clearly convey the motivation, problem statement, and core contributions. - Key components—such as Rephrase Instruction and Optimized Observation—are insufficiently described, weakening conceptual clarity. W2. Limited Research Novelty - The method primarily combines existing heuristics and engineering improvements rath
1. WebOperator combines many techniques which when put together achieve state-of-the-art results on WebArena (55.68%) with an open-source backbone LM, outperforming baselines using closed-source LLMs like AgentSymbiotic, ScribeAgent etc.(leaderboard can be found here: https://docs.google.com/spreadsheets/d/1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit?gid=0#gid=0) 2. On WebVoyager, WebOperator achieves a higher success rate (63.57%) when compared to AgentOccam (48.84%). 3. The paper impleme
1. The approach is not original in any one of the many techniques the paper uses. Tree-search for web-agents has been explored in prior works as mentioned in the paper, and it is unclear what percentage gains are brought about by WebOperator’s novelties on top of existing works like AgentOccam [1] and Tree-search [2]. A baseline which could have been used is Tree-search algorithm by Koh et. al. with AgentOccam’s observation space improvements or Tree-search algorithm with the WebShepherd reward
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Semantic Web and Ontologies
