Enhancing Web Agents with a Hierarchical Memory Tree
Yunteng Tan, Zhi Gao, Xinxiao Wu

TL;DR
This paper introduces a Hierarchical Memory Tree (HMT) that structures web agent memory into levels to improve generalization across unseen websites, addressing workflow mismatches caused by flat memory structures.
Contribution
The paper proposes a novel hierarchical memory framework that decouples planning from execution, enhancing web agents' ability to generalize across diverse and unseen websites.
Findings
HMT outperforms flat-memory methods in cross-website tasks.
Structured memory improves logical consistency and task success.
HMT enhances robustness in complex web interactions.
Abstract
Large language model-based web agents have shown strong potential in automating web interactions through advanced reasoning and instruction following. While retrieval-based memory derived from historical trajectories enables these agents to handle complex, long-horizon tasks, current methods struggle to generalize across unseen websites. We identify that this challenge arises from the flat memory structures that entangle high-level task logic with site-specific action details. This entanglement induces a workflow mismatch in new environments, where retrieved contents are conflated with current web, leading to logically inconsistent execution. To address this, we propose Hierarchical Memory Tree (HMT), a structured framework designed to explicitly decouple logical planning from action execution. HMT constructs a three-level hierarchy from raw trajectories via an automated abstraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Multi-Agent Systems and Negotiation
