RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion
Bardienus P. Duisterhof, Jan Oberst, Bowen Wen, Stan Birchfield, Deva Ramanan, Jeffrey Ichnowski

TL;DR
RaySt3R introduces a novel view synthesis approach using transformers for 3D shape completion from a single RGB-D image, achieving state-of-the-art results with better 3D consistency and boundary accuracy.
Contribution
It recasts 3D shape completion as a view synthesis problem and employs a transformer to predict depth, masks, and confidence, improving accuracy and efficiency.
Findings
Outperforms baselines by up to 44% in 3D chamfer distance
Achieves state-of-the-art performance on synthetic and real datasets
Addresses 3D consistency and boundary sharpness issues
Abstract
3D shape completion has broad applications in robotics, digital twin reconstruction, and extended reality (XR). Although recent advances in 3D object and scene completion have achieved impressive results, existing methods lack 3D consistency, are computationally expensive, and struggle to capture sharp object boundaries. Our work (RaySt3R) addresses these limitations by recasting 3D shape completion as a novel view synthesis problem. Specifically, given a single RGB-D image and a novel viewpoint (encoded as a collection of query rays), we train a feedforward transformer to predict depth maps, object masks, and per-pixel confidence scores for those query rays. RaySt3R fuses these predictions across multiple query views to reconstruct complete 3D shapes. We evaluate RaySt3R on synthetic and real-world datasets, and observe it achieves state-of-the-art performance, outperforming the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Robot Manipulation and Learning
