Searching in Space and Time: Unified Memory-Action Loops for Open-World Object Retrieval
Taijing Chen, Sateesh Kumar, Junhong Xu, Georgios Pavlakos, Joydeep Biswas, Roberto Mart\'in-Mart\'in

TL;DR
The paper introduces STAR, a unified framework for open-world object retrieval that combines spatial and temporal reasoning with embodied actions, improving search efficiency in dynamic environments.
Contribution
STAR unifies memory-action loops for spatial and temporal object retrieval, integrating vision-language models with non-parametric memory for improved robot search capabilities.
Findings
STAR outperforms scene-graph baselines in experiments.
The framework effectively handles dynamic, open-world environments.
STARBench provides a new benchmark for spatiotemporal object search.
Abstract
Service robots must retrieve objects in dynamic, open-world settings where requests may reference attributes ("the red mug"), spatial context ("the mug on the table"), or past states ("the mug that was here yesterday"). Existing approaches capture only parts of this problem: scene graphs capture spatial relations but ignore temporal grounding, temporal reasoning methods model dynamics but do not support embodied interaction, and dynamic scene graphs handle both but remain closed-world with fixed vocabularies. We present STAR (SpatioTemporal Active Retrieval), a framework that unifies memory queries and embodied actions within a single decision loop. STAR leverages non-parametric long-term memory and a working memory to support efficient recall, and uses a vision-language model to select either temporal or spatial actions at each step. We introduce STARBench, a benchmark of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · AI-based Problem Solving and Planning
