Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps
Sungjun Cho

TL;DR
This paper demonstrates that training with two views per optimizer step significantly improves hybrid-capture 3D Gaussian Splatting, explained by a variance-decomposition framework highlighting the importance of structured view pairing.
Contribution
The key novelty is identifying two-view accumulation as the primary training lever, supported by a variance-based explanation for its effectiveness.
Findings
Two views per step outperform other methods in hybrid-capture 3DGS.
Variance decomposition explains the effectiveness of two-view training.
The two-view approach transfers to other Gaussian Splatting backbones.
Abstract
Hybrid-capture novel view synthesis combines images at substantially different camera distances (e.g., aerial drone and ground-level views). Standard 3D Gaussian Splatting (3DGS), trained for 30K iterations with one rendered view per optimizer step, under-fits the minority regime by 1-3 dB on five hybrid-capture benchmarks. We isolate the lever that closes this gap. Among compute-matched alternatives -- vanilla 60K iterations, magnitude corrections (GradNorm), direction-aware near/far gradient surgery, projective preconditioning, confidence-gated sample-level surgery, and a random two-view-per-step control -- the simplest structural change wins: rendering two views per optimizer step. The pairing rule (geometry-defined near/far, random, or active loss-disparity) does not change PSNR beyond seed variance on any of the five scenes; the structural change of having two views per step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
