Retrieval-Augmented Robots via Retrieve-Reason-Act

Izat Temiraliev; Diji Yang; Yi Zhang

arXiv:2603.02688·cs.AI·March 4, 2026

Retrieval-Augmented Robots via Retrieve-Reason-Act

Izat Temiraliev, Diji Yang, Yi Zhang

PDF

Open Access

TL;DR

This paper introduces Retrieval-Augmented Robotics (RAR), enabling robots to actively retrieve and utilize external visual documentation for complex task execution, significantly improving performance in long-horizon assembly tasks.

Contribution

The paper formulates a novel Retrieve-Reason-Act paradigm for robots, integrating external visual manuals into the planning process for improved zero-shot task execution.

Findings

01

Grounded visual retrieval improves assembly success rates

02

Retrieval-based planning outperforms zero-shot baselines

03

Demonstrates effective external knowledge integration in robotics

Abstract

To achieve general-purpose utility, we argue that robots must evolve from passive executors into active Information Retrieval users. In strictly zero-shot settings where no prior demonstrations exist, robots face a critical information gap, such as the exact sequence required to assemble a complex furniture kit, that cannot be satisfied by internal parametric knowledge (common sense) or past internal memory. While recent robotic works attempt to use search before action, they primarily focus on retrieving past kinematic trajectories (analogous to searching internal memory) or text-based safety rules (searching for constraints). These approaches fail to address the core information need of active task construction: acquiring unseen procedural knowledge from external, unstructured documentation. In this paper, we define the paradigm as Retrieval-Augmented Robotics (RAR), empowering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI