DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation

Young Hun Kim; Seungyeon Kim; Yonghyeon Lee; Frank Chongwoo Park

arXiv:2507.05627·cs.RO·July 9, 2025

DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation

Young Hun Kim, Seungyeon Kim, Yonghyeon Lee, Frank Chongwoo Park

PDF

Open Access

TL;DR

DreamGrasp is a novel framework that uses large-scale pre-trained generative models to perform zero-shot 3D multi-object reconstruction from partial views, enabling robust robotic manipulation in cluttered environments.

Contribution

It introduces a new approach combining generative models, contrastive learning, and text-guided refinement for zero-shot 3D scene reconstruction from limited views.

Findings

01

Achieves accurate 3D reconstruction of multiple objects from partial views.

02

Supports downstream tasks like decluttering and target retrieval with high success.

03

Outperforms prior methods in generalization to complex, real-world scenes.

Abstract

Partial-view 3D recognition -- reconstructing 3D geometry and identifying object instances from a few sparse RGB images -- is an exceptionally challenging yet practically essential task, particularly in cluttered, occluded real-world settings where full-view or reliable depth data are often unavailable. Existing methods, whether based on strong symmetry priors or supervised learning on curated datasets, fail to generalize to such scenarios. In this work, we introduce DreamGrasp, a framework that leverages the imagination capability of large-scale pre-trained image generative models to infer the unobserved parts of a scene. By combining coarse 3D reconstruction, instance segmentation via contrastive learning, and text-guided instance-wise refinement, DreamGrasp circumvents limitations of prior methods and enables robust 3D reconstruction in complex, multi-object environments. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Robot Manipulation and Learning