FluSplat: Sparse-View 3D Editing without Test-Time Optimization
Haitao Huang, Shin-Fang Chng, Huangying Zhan, Qingan Yan, and Yi Xu

TL;DR
FluSplat introduces a fast, feed-forward 3D scene editing method that ensures cross-view consistency without iterative optimization, significantly reducing inference time while maintaining high editing quality.
Contribution
The paper presents a novel training scheme with cross-view regularization enabling view-consistent 3D editing without scene-specific optimization at inference.
Findings
Achieves comparable editing quality to optimization-based methods.
Substantially improves cross-view consistency in 3D scene editing.
Reduces inference time by orders of magnitude.
Abstract
Recent advances in text-guided image editing and 3D Gaussian Splatting (3DGS) have enabled high-quality 3D scene manipulation. However, existing pipelines rely on iterative edit-and-fit optimization at test time, alternating between 2D diffusion editing and 3D reconstruction. This process is computationally expensive, scene-specific, and prone to cross-view inconsistencies. We propose a feed-forward framework for cross-view consistent 3D scene editing from sparse views. Instead of enforcing consistency through iterative 3D refinement, we introduce a cross-view regularization scheme in the image domain during training. By jointly supervising multi-view edits with geometric alignment constraints, our model produces view-consistent results without per-scene optimization at inference. The edited views are then lifted into 3D via a feedforward 3DGS model, yielding a coherent 3DGS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
