DETOUR: An Interactive Benchmark for Dual-Agent Search and Reasoning

Li Siyan; Darshan Deshpande; Anand Kannappan; Rebecca Qian

arXiv:2602.00352·cs.CL·February 3, 2026

DETOUR: An Interactive Benchmark for Dual-Agent Search and Reasoning

Li Siyan, Darshan Deshpande, Anand Kannappan, Rebecca Qian

PDF

Open Access

TL;DR

DETOUR is a new benchmark for evaluating dual-agent search and reasoning in complex, multi-turn, multi-modal recall tasks, revealing current models' limitations in underspecified scenarios.

Contribution

Introduces DETOUR, a dual-agent benchmark with 1,011 prompts for more realistic tip-of-the-tongue search evaluation, emphasizing multi-turn and multi-modal challenges.

Findings

01

State-of-the-art models achieve only 36% accuracy on DETOUR.

02

Current models struggle with underspecified, multi-modal recall tasks.

03

Highlights need for improved reasoning and retrieval capabilities.

Abstract

When recalling information in conversation, people often arrive at the recollection after multiple turns. However, existing benchmarks for evaluating agent capabilities in such tip-of-the-tongue search processes are restricted to single-turn settings. To more realistically simulate tip-of-the-tongue search, we introduce Dual-agent based Evaluation Through Obscure Under-specified Retrieval (DETOUR), a dual-agent evaluation benchmark containing 1,011 prompts. The benchmark design involves a Primary Agent, which is the subject of evaluation, tasked with identifying the recollected entity through querying a Memory Agent that is held consistent across evaluations. Our results indicate that current state-of-the-art models still struggle with our benchmark, only achieving 36% accuracy when evaluated on all modalities (text, image, audio, and video), highlighting the importance of enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Social Robot Interaction and HRI