Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs

Jensen Gao; Dorsa Sadigh; Sandy Huang; Dhruv Shah

arXiv:2603.11426·cs.RO·March 19, 2026

Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs

Jensen Gao, Dorsa Sadigh, Sandy Huang, Dhruv Shah

PDF

Open Access

TL;DR

RADAR is a scalable framework that uses retrieval and vision-language models to analyze and classify the type of policy generalization needed in robot manipulation tasks, improving evaluation precision.

Contribution

Introduces RADAR, a two-stage retrieval and analysis pipeline that characterizes policy generalization in robotics using interpretable data comparisons and large-scale datasets.

Findings

01

VLMs effectively analyze data for generalization.

02

Retrieval step accurately identifies relevant training examples.

03

RADAR scales to large datasets and agrees with human benchmarks.

Abstract

Recent work on robot manipulation has advanced policy generalization to novel scenarios. However, it is often difficult to characterize how different evaluation settings actually represent generalization from the training distribution of a given policy. To work towards more precise evaluation of generalization in robotics, we propose RADAR, a scalable framework for directly comparing test-time evaluation tasks to policy training data, to determine what form of policy generalization is required. RADAR consists of a two-stage pipeline: first, retrieval using generalist policy embeddings identifies which training examples are relevant for a given evaluation task. Next, vision-language models (VLMs) analyze the evaluation task against the retrieved data, outputting interpretable analysis on how they compare along a variety of axes, and an overall classification of what type of policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning