Learning to Retrieve from Agent Trajectories
Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, and Ji-Rong Wen

TL;DR
This paper introduces LRAT, a new training paradigm for retrieval models that leverages agent interaction data, improving retrieval effectiveness in agent-based search systems.
Contribution
It proposes a novel method to train retrieval models directly from agent trajectories, addressing the mismatch with human-centric training data.
Findings
LRAT improves evidence recall in various benchmarks.
Retrievers trained with LRAT enhance end-to-end task success.
The approach is effective across diverse agent architectures and scales.
Abstract
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
