ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

Yawar Siddiqui; Duncan Frost; Samir Aroudj; Armen Avetisyan; Henry Howard-Jenkins; Daniel DeTone; Pierre Moulon; Qirui Wu; Zhengqin Li; Julian Straub; Richard Newcombe; Jakob Engel

arXiv:2601.11514·cs.CV·January 19, 2026

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel

PDF

Open Access 1 Models 2 Datasets

TL;DR

ShapeR is a robust method for generating high-quality 3D object shapes from casual, real-world image sequences by leveraging multi-modal data and advanced training techniques, outperforming existing methods significantly.

Contribution

We introduce ShapeR, a novel framework that effectively generates 3D shapes from casual captures using multi-modal data and robust training strategies, with a new in-the-wild evaluation benchmark.

Findings

01

Outperforms existing methods with 2.7x better Chamfer distance

02

Successfully handles cluttered backgrounds and unstructured data

03

Achieves high-fidelity 3D shapes from casual image sequences

Abstract

Recent advances in 3D shape generation have achieved impressive results, but most existing methods rely on clean, unoccluded, and well-segmented inputs. Such conditions are rarely met in real-world scenarios. We present ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Given an image sequence, we leverage off-the-shelf visual-inertial SLAM, 3D detection algorithms, and vision-language models to extract, for each object, a set of sparse SLAM points, posed multi-view images, and machine-generated captions. A rectified flow transformer trained to effectively condition on these modalities then generates high-fidelity metric 3D shapes. To ensure robustness to the challenges of casually captured data, we employ a range of techniques including on-the-fly compositional augmentations, a curriculum training scheme spanning object- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
facebook/ShapeR
model· 96 dl· ♡ 48
96 dl♡ 48

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging