A Systematic Study of Data Modalities and Strategies for Co-training Large Behavior Models for Robot Manipulation
Fanqi Lin, Kushal Arora, Jean Mercat, Haruki Nishimura, Paarth Shah, Chen Xu, Mengchao Zhang, Mark Zolotas, Maya Angeles, Owen Pfannenstiehl, Andrew Beaulieu, Jose Barreiros

TL;DR
This paper systematically studies how different data modalities and training strategies in co-training large behavior models affect robot manipulation performance, providing insights for building scalable generalist robot policies.
Contribution
It offers a large-scale empirical analysis of five co-training data modalities and strategies, revealing their impact on policy generalization and adaptation in robot manipulation.
Findings
Vision-language and cross-embodiment data improve generalization.
Discrete action tokens do not significantly benefit performance.
Combining effective modalities yields cumulative gains and rapid adaptation.
Abstract
Large behavior models have shown strong dexterous manipulation capabilities by extending imitation learning to large-scale training on multi-task robot data, yet their generalization remains limited by the insufficient robot data coverage. To expand this coverage without costly additional data collection, recent work relies on co-training: jointly learning from target robot data and heterogeneous data modalities. However, how different co-training data modalities and strategies affect policy performance remains poorly understood. We present a large-scale empirical study examining five co-training data modalities: standard vision-language data, dense language annotations for robot trajectories, cross-embodiment robot data, human videos, and discrete robot action tokens across single- and multi-phase training strategies. Our study leverages 4,000 hours of robot and human manipulation data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Social Robot Interaction and HRI
