On the Multi-turn Instruction Following for Conversational Web Agents

Yang Deng; Xuan Zhang; Wenxuan Zhang; Yifei Yuan; See-Kiong Ng,; Tat-Seng Chua

arXiv:2402.15057·cs.CL·February 26, 2024·1 cites

On the Multi-turn Instruction Following for Conversational Web Agents

Yang Deng, Xuan Zhang, Wenxuan Zhang, Yifei Yuan, See-Kiong Ng,, Tat-Seng Chua

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new task called Conversational Web Navigation, supported by a novel dataset and a self-reflective memory-augmented planning framework, to improve multi-turn instruction following by LLM-powered web agents.

Contribution

The work presents a new dataset and a novel Self-MAP framework to enhance multi-turn instruction following in web agents, addressing context limitations and dependency issues.

Findings

01

Self-MAP outperforms baseline methods on MT-Mind2Web.

02

Memory and self-reflection improve task success rates.

03

Benchmark results demonstrate the effectiveness of the proposed approach.

Abstract

Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

magicgh/self-map
pytorchOfficial

Videos

On the Multi-turn Instruction Following for Conversational Web Agents· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management