TL;DR
This paper introduces ACGM, a learned graph-memory retriever that adaptively constructs relevance graphs over multi-modal web histories, significantly improving retrieval quality for downstream tasks.
Contribution
The paper presents ACGM, a novel task-adaptive graph-memory retrieval method optimized via policy-gradient, outperforming prior static and fixed-capacity approaches.
Findings
ACGM achieves 82.7 nDCG@10, a 9.3 point improvement over GPT-4o.
ACGM attains 89.2% Precision@10, outperforming 19 baselines.
Modality-specific decay rates show visual decays 4.3 times faster than text.
Abstract
Retrieving relevant observations from long multi-modal web interaction histories is challenging because relevance depends on the evolving task state, modality (screenshots, HTML text, structured signals), and temporal distance. Prior approaches typically rely on static similarity thresholds or fixed-capacity buffers, which fail to adapt relevance to the current task context. We propose \textbf{ACGM}, a learned graph-memory retriever that constructs \emph{task-adaptive} relevance graphs over agent histories using policy-gradient optimization from downstream task success. ACGM captures heterogeneous temporal dynamics with modality-specific decay (visual decays faster than text: vs.\ ) and learns sparse connectivity (3.2 edges/node), enabling efficient retrieval. Across WebShop, VisualWebArena, and Mind2Web, ACGM improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
