DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

Shivin Dass; Alaa Khaddaj; Logan Engstrom; Aleksander Madry; Andrew Ilyas; Roberto Mart\'in-Mart\'in

arXiv:2505.09603·cs.RO·May 15, 2025

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

Shivin Dass, Alaa Khaddaj, Logan Engstrom, Aleksander Madry, Andrew Ilyas, Roberto Mart\'in-Mart\'in

PDF

Open Access 3 Reviews

TL;DR

DataMIL is a novel data selection framework that uses policy-driven reasoning to improve robot imitation learning by selecting data that directly enhances task success, validated across diverse manipulation tasks.

Contribution

Introduces DataMIL, an end-to-end, policy-based data selection method that optimizes for task success, improving the use of large prior datasets in robot imitation learning.

Findings

01

DataMIL achieves consistent success rate improvements.

02

Outperforms multiple baseline data selection methods.

03

Effective in both simulation and real-world tasks.

Abstract

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist robot policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The paper builds on the datamodels paradigm and adapts it thoughtfully to robotics, incorporating principled techniques like metagradients and surrogate losses to avoid expensive rollouts. 2. The paper demonstrates consistent improvements across 60+ simulation and real-world tasks (MetaWorld, LIBERO, OXE), making the empirical evidence strong and diverse. 3. Comparisons against multiple state-of-the-art retrieval and co-training methods, plus ablations on surrogate loss vs true success metric

Weaknesses

1. Even with meta-gradients, building datamodels requires multiple policy trainings, making the pipeline significantly more expensive than standard fine-tuning or heuristic retrieval. This may limit usability in real-world labs without large compute. 2. The method is evaluated primarily on one new task at a time. Broader deployment to large-scale continual or many-task adaptation scenarios remains underexplored. 3. The core datamodel assumption is borrowed from NLP/CV; a deeper theoretical justi

Reviewer 02Rating 8Confidence 3

Strengths

- The proposed problem is critical for the robot learning community. As the field increasingly relies on scaling up data (e.g., OXE), methods for effectively curating this data to specialize in new tasks are essential. - The paper's core idea of using a differentiable, rollout-free surrogate metric (BC loss on target data) is novel and effective. It elegantly adapts the datamodels framework to the specific constraints of robotics. - The evaluation is comprehensive, spanning multiple benchmarks (

Weaknesses

- The proposed data model incurs high computational costs, as it requires per-task estimation. - The method is far from "plug-and-play," as it introduces numerous new hyperparameters that demand meticulous tuning, often relying on human heuristics and empirical expertise.

Reviewer 03Rating 6Confidence 3

Strengths

- Instead of using heuristics such as visual similarity or state–action closeness to select subsets of prior data, this paper proposed to use the performance-aware datamodels to predict scores for selecting each data, resulting in an end-to-end data selection algorithm for robotics imitation learning problem. - The proposed proxy metric prevents expensive and intractable rollouts on real-robot during data selection and also prevents repeatedly learning the same algorithms on different subsets o

Weaknesses

- Gradient-based data reweighting and data balancing for supervised learning problems has been widely studied in the literature [1, 2, 3]. The concepts of datamodels and regression based methods have been proposed in prior methods. Also, the data selection for robotics imitation learning is still a supervised learning problem. Overall, the contributions of the paper seem to be incremental. - The explanations for some strategies in Sec. 4.2 are unclear and some ablation experiments are limited.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Machine Learning and Algorithms