Retrieval-Augmented Robots via Retrieve-Reason-Act
Izat Temiraliev, Diji Yang, Yi Zhang

TL;DR
This paper introduces Retrieval-Augmented Robotics (RAR), enabling robots to actively retrieve and utilize external visual documentation for complex task execution, significantly improving performance in long-horizon assembly tasks.
Contribution
The paper formulates a novel Retrieve-Reason-Act paradigm for robots, integrating external visual manuals into the planning process for improved zero-shot task execution.
Findings
Grounded visual retrieval improves assembly success rates
Retrieval-based planning outperforms zero-shot baselines
Demonstrates effective external knowledge integration in robotics
Abstract
To achieve general-purpose utility, we argue that robots must evolve from passive executors into active Information Retrieval users. In strictly zero-shot settings where no prior demonstrations exist, robots face a critical information gap, such as the exact sequence required to assemble a complex furniture kit, that cannot be satisfied by internal parametric knowledge (common sense) or past internal memory. While recent robotic works attempt to use search before action, they primarily focus on retrieving past kinematic trajectories (analogous to searching internal memory) or text-based safety rules (searching for constraints). These approaches fail to address the core information need of active task construction: acquiring unseen procedural knowledge from external, unstructured documentation. In this paper, we define the paradigm as Retrieval-Augmented Robotics (RAR), empowering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
