Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

Yihao Wu; Jinming Ma; Junbo Tan; Yanzhao Yu; Shoujie Li; Mingliang Zhou; Diyun Xiang; Xueqian Wang

arXiv:2602.11885·cs.RO·February 13, 2026

Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

Yihao Wu, Jinming Ma, Junbo Tan, Yanzhao Yu, Shoujie Li, Mingliang Zhou, Diyun Xiang, Xueqian Wang

PDF

Open Access

TL;DR

This paper introduces a bounding-box guided diffusion policy for semantic manipulation, revealing data scaling laws that improve generalization and adaptability in real-world robotic tasks.

Contribution

It proposes a novel semantic-motion-decoupled framework and a data collection pipeline, demonstrating the existence of power-law relationships in data scaling for manipulation tasks.

Findings

01

Power-law relationship between generalization and number of bounding-box objects

02

Achieved 85% success rate across multiple tasks on unseen objects

03

Effective data collection strategy enhances manipulation performance

Abstract

Diffusion-based policies show limited generalization in semantic manipulation, posing a key obstacle to the deployment of real-world robots. This limitation arises because relying solely on text instructions is inadequate to direct the policy's attention toward the target object in complex and dynamic environments. To solve this problem, we propose leveraging bounding-box instruction to directly specify target object, and further investigate whether data scaling laws exist in semantic manipulation tasks. Specifically, we design a handheld segmentation device with an automated annotation pipeline, Label-UMI, which enables the efficient collection of demonstration data with semantic labels. We further propose a semantic-motion-decoupled framework that integrates object detection and bounding-box guided diffusion policy to improve generalization and adaptability in semantic manipulation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications