ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
Haitang Feng, Jie Liu, Jie Tang, Gangshan Wu, Beiqi Chen, Jianhuang Lai, Guangcong Wang

TL;DR
ObjFiller-3D introduces a novel 3D inpainting method that adapts video diffusion models to produce consistent, high-quality 3D object reconstructions, outperforming previous approaches in fidelity and realism.
Contribution
The paper presents a new approach that leverages video inpainting models for 3D scene completion, addressing inconsistencies in multi-view inpainting and introducing reference-based enhancements.
Findings
Achieves higher PSNR (26.6) compared to NeRFiller (15.9).
Reduces LPIPS to 0.19 from Instant3dit's 0.25.
Produces more faithful and fine-grained 3D reconstructions.
Abstract
3D inpainting often relies on multi-view 2D image inpainting, where the inherent inconsistencies across different inpainted views can result in blurred textures, spatial discontinuities, and distracting visual artifacts. These inconsistencies pose significant challenges when striving for accurate and realistic 3D object completion, particularly in applications that demand high fidelity and structural coherence. To overcome these limitations, we propose ObjFiller-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects. Instead of employing a conventional 2D image inpainting model, our approach leverages a curated selection of state-of-the-art video editing model to fill in the masked regions of 3D objects. We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
