TL;DR
Splatent introduces a diffusion-based framework that enhances 3D radiance field reconstructions by recovering details in 2D input views, maintaining VAE quality and achieving state-of-the-art results.
Contribution
It proposes recovering fine details in 2D from multiple views using multi-view attention, instead of reconstructing in 3D, preserving VAE quality and improving 3D reconstruction fidelity.
Findings
Achieves new state-of-the-art in VAE latent radiance field reconstruction.
Improves detail preservation in sparse-view 3D reconstructions.
Enhances existing frameworks by integrating 2D detail recovery.
Abstract
Radiance field representations have recently been explored in the latent space of VAEs that are commonly used by diffusion models. This direction offers efficient rendering and seamless integration with diffusion-based pipelines. However, these methods face a fundamental limitation: The VAE latent space lacks multi-view consistency, leading to blurred textures and missing details during 3D reconstruction. Existing approaches attempt to address this by fine-tuning the VAE, at the cost of reconstruction quality, or by relying on pre-trained diffusion models to recover fine-grained details, at the risk of some hallucinations. We present Splatent, a diffusion-based enhancement framework designed to operate on top of 3D Gaussian Splatting (3DGS) in the latent space of VAEs. Our key insight departs from the conventional 3D-centric view: rather than reconstructing fine-grained details in 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
