Guiding Data Collection via Factored Scaling Curves

Lihan Zha; Apurva Badithela; Michael Zhang; Justin Lidard; Jeremy Bao; Emily Zhou; David Snyder; Allen Z. Ren; Dhruv Shah; Anirudha Majumdar

arXiv:2505.07728·cs.RO·May 13, 2025

Guiding Data Collection via Factored Scaling Curves

Lihan Zha, Apurva Badithela, Michael Zhang, Justin Lidard, Jeremy Bao, Emily Zhou, David Snyder, Allen Z. Ren, Dhruv Shah, Anirudha Majumdar

PDF

Open Access 1 Repo

TL;DR

This paper introduces factored scaling curves (FSC), a method to efficiently guide data collection for imitation learning policies by understanding how performance scales with environmental factors, improving generalization in manipulation tasks.

Contribution

The paper presents a novel principled approach using factored scaling curves to optimize data collection across environmental factors, reducing costs and improving policy generalization.

Findings

01

Boosts success rates in real-world tasks by up to 26%.

02

Effectively guides data collection using offline metrics.

03

Enhances generalization across diverse environmental conditions.

Abstract

Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance varies as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. We evaluate the proposed method through extensive simulated and real-world experiments, across both training-from-scratch and fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

irom-princeton/factored-scaling-curves
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · 3D Shape Modeling and Analysis

MethodsSparse Evolutionary Training