RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction
Isaac Kasahara, Shubham Agrawal, Selim Engin, Nikhil Chavan-Dafle,, Shuran Song, Volkan Isler

TL;DR
This paper introduces RIC, a scene reconstruction method that combines inpainting with 2D to 3D lifting, leveraging large language models for generalization to unseen objects and scenes.
Contribution
The method innovatively uses large visual language models for inpainting and normal prediction for robust 3D scene reconstruction from a single view.
Findings
Outperforms multiple baselines in quantitative evaluation.
Demonstrates strong generalization to novel objects and scenes.
Robust to variations in depth and scale.
Abstract
General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsInpainting
