TL;DR
This paper introduces a novel Gaussian Splatting approach tailored for sparse multi-view video in filmmaking, effectively capturing complex dynamic scenes with limited camera data.
Contribution
It proposes splitting foreground and background representations with sparse masks, enabling high-quality dynamic 3D reconstructions without dense supervision.
Findings
Achieves up to 3 PSNR higher than state-of-the-art methods.
Produces segmented dynamic reconstructions including transparent textures.
Operates effectively with sparse camera configurations and limited supervision.
Abstract
Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
