ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search

Myungchul Kim; Kwanyong Park; Junmo Kim; and In So Kweon

arXiv:2604.12762·cs.CV·April 15, 2026

ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search

Myungchul Kim, Kwanyong Park, Junmo Kim, and In So Kweon

PDF

TL;DR

ARGOS is a novel benchmark and framework that models multi-camera person search as an interactive reasoning task, requiring planning, questioning, and tool use under information constraints.

Contribution

It introduces the first benchmark for agentic multi-camera person search, integrating reasoning, tool use, and real-world scenarios with comprehensive evaluation.

Findings

01

The benchmark includes 2,691 tasks across 14 scenarios.

02

Current models perform far from optimal, with best TWS scores below 0.6.

03

Removing domain-specific tools significantly reduces accuracy.

Abstract

We introduce ARGOS, the first benchmark and framework that reformulates multi-camera person search as an interactive reasoning problem requiring an agent to plan, question, and eliminate candidates under information asymmetry. An ARGOS agent receives a vague witness statement and must decide what to ask, when to invoke spatial or temporal tools, and how to interpret ambiguous responses, all within a limited turn budget. Reasoning is grounded in a Spatio-Temporal Topology Graph (STTG) encoding camera connectivity and empirically validated transition times. The benchmark comprises 2,691 tasks across 14 real-world scenarios in three progressive tracks: semantic perception (Who), spatial reasoning (Where), and temporal reasoning (When). Experiments with four LLM backbones show the benchmark is far from solved (best TWS: 0.383 on Track 2, 0.590 on Track 3), and ablations confirm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.