FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real

Weiheng Liu; Yuxuan Wan; Jilong Wang; Yuxuan Kuang; Wenbo Cui; Xuesong Shi; Haoran Li; Dongbin Zhao; Zhizheng Zhang; He Wang

arXiv:2502.17894·cs.RO·August 26, 2025

FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real

Weiheng Liu, Yuxuan Wan, Jilong Wang, Yuxuan Kuang, Wenbo Cui, Xuesong Shi, Haoran Li, Dongbin Zhao, Zhizheng Zhang, He Wang

PDF

TL;DR

FetchBot is a zero-shot sim-to-real framework that enables robust object fetching in cluttered scenes by leveraging synthetic data, depth prediction, and occupancy modeling, achieving high success rates without real-world training.

Contribution

The paper introduces FetchBot, a novel sim-to-real approach for object fetching that combines large-scale synthetic data, depth prediction from RGB, and occupancy modeling for obstacle-aware planning.

Findings

01

Achieves 89.95% success rate in real-world cluttered scenes.

02

Demonstrates strong zero-shot transfer from simulation to real-world.

03

Effectively handles transparent, reflective, and irregular objects.

Abstract

Generalizable object fetching in cluttered scenes remains a fundamental and application-critical challenge in embodied AI. Closely packed objects cause inevitable occlusions, making safe action generation particularly difficult. Under such partial observability, effective policies must not only generalize across diverse objects and layouts but also reason about occlusion to avoid collisions. However, collecting large-scale real-world data for this task remains prohibitively expensive, leaving this problem largely unsolved. In this paper, we introduce FetchBot, a sim-to-real framework for this challenge. We first curate a large-scale synthetic dataset featuring 1M diverse scenes and 500k representative demonstrations. Based on this dataset, FetchBot employs a depth-conditioned method for action generation, which leverages structural cues to enable robust obstacle-aware action planning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsADaptive gradient method with the OPTimal convergence rate