Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies
Yihao Wu, Jinming Ma, Junbo Tan, Yanzhao Yu, Shoujie Li, Mingliang Zhou, Diyun Xiang, Xueqian Wang

TL;DR
This paper introduces a bounding-box guided diffusion policy for semantic manipulation, revealing data scaling laws that improve generalization and adaptability in real-world robotic tasks.
Contribution
It proposes a novel semantic-motion-decoupled framework and a data collection pipeline, demonstrating the existence of power-law relationships in data scaling for manipulation tasks.
Findings
Power-law relationship between generalization and number of bounding-box objects
Achieved 85% success rate across multiple tasks on unseen objects
Effective data collection strategy enhances manipulation performance
Abstract
Diffusion-based policies show limited generalization in semantic manipulation, posing a key obstacle to the deployment of real-world robots. This limitation arises because relying solely on text instructions is inadequate to direct the policy's attention toward the target object in complex and dynamic environments. To solve this problem, we propose leveraging bounding-box instruction to directly specify target object, and further investigate whether data scaling laws exist in semantic manipulation tasks. Specifically, we design a handheld segmentation device with an automated annotation pipeline, Label-UMI, which enables the efficient collection of demonstration data with semantic labels. We further propose a semantic-motion-decoupled framework that integrates object detection and bounding-box guided diffusion policy to improve generalization and adaptability in semantic manipulation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
