On the Multi-turn Instruction Following for Conversational Web Agents
Yang Deng, Xuan Zhang, Wenxuan Zhang, Yifei Yuan, See-Kiong Ng,, Tat-Seng Chua

TL;DR
This paper introduces a new task called Conversational Web Navigation, supported by a novel dataset and a self-reflective memory-augmented planning framework, to improve multi-turn instruction following by LLM-powered web agents.
Contribution
The work presents a new dataset and a novel Self-MAP framework to enhance multi-turn instruction following in web agents, addressing context limitations and dependency issues.
Findings
Self-MAP outperforms baseline methods on MT-Mind2Web.
Memory and self-reflection improve task success rates.
Benchmark results demonstrate the effectiveness of the proposed approach.
Abstract
Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management
