WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

Minda Hu; Tianqing Fang; Jianshu Zhang; Junyu Ma; Zhisong Zhang; Jingyan Zhou; Hongming Zhang; Haitao Mi; Dong Yu; Irwin King

arXiv:2505.20013·cs.CL·September 19, 2025

WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

Minda Hu, Tianqing Fang, Jianshu Zhang, Junyu Ma, Zhisong Zhang, Jingyan Zhou, Hongming Zhang, Haitao Mi, Dong Yu, Irwin King

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces WebCoT, a method to improve web agent reasoning by reconstructing and fine-tuning chain-of-thought rationales for better performance in dynamic web environments.

Contribution

It identifies key reasoning skills for web agents, curates trajectory data, and demonstrates how reconstructing reasoning algorithms enhances LLM performance.

Findings

01

Significant performance improvements on multiple web benchmarks.

02

Effective distillation of reasoning patterns into LLMs through fine-tuning.

03

Enhanced reasoning skills lead to more robust web agent behavior.

Abstract

Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent's (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CognitiveKernel/WebCoT
dataset· 40 dl
40 dl

Videos

WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback· underline

Taxonomy

TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge