FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
Weiheng Liu, Yuxuan Wan, Jilong Wang, Yuxuan Kuang, Wenbo Cui, Xuesong Shi, Haoran Li, Dongbin Zhao, Zhizheng Zhang, He Wang

TL;DR
FetchBot is a zero-shot sim-to-real framework that enables robust object fetching in cluttered scenes by leveraging synthetic data, depth prediction, and occupancy modeling, achieving high success rates without real-world training.
Contribution
The paper introduces FetchBot, a novel sim-to-real approach for object fetching that combines large-scale synthetic data, depth prediction from RGB, and occupancy modeling for obstacle-aware planning.
Findings
Achieves 89.95% success rate in real-world cluttered scenes.
Demonstrates strong zero-shot transfer from simulation to real-world.
Effectively handles transparent, reflective, and irregular objects.
Abstract
Generalizable object fetching in cluttered scenes remains a fundamental and application-critical challenge in embodied AI. Closely packed objects cause inevitable occlusions, making safe action generation particularly difficult. Under such partial observability, effective policies must not only generalize across diverse objects and layouts but also reason about occlusion to avoid collisions. However, collecting large-scale real-world data for this task remains prohibitively expensive, leaving this problem largely unsolved. In this paper, we introduce FetchBot, a sim-to-real framework for this challenge. We first curate a large-scale synthetic dataset featuring 1M diverse scenes and 500k representative demonstrations. Based on this dataset, FetchBot employs a depth-conditioned method for action generation, which leverages structural cues to enable robust obstacle-aware action planning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsADaptive gradient method with the OPTimal convergence rate
