TL;DR
NoPo4D is a novel feed-forward system that reconstructs dynamic 3D scenes from unposed multi-view videos efficiently, without requiring accurate camera poses or per-scene optimization.
Contribution
It introduces a velocity decomposition and a bidirectional motion encoder to handle multi-view dynamic scenes in a single feed-forward pass without pose supervision.
Findings
Outperforms prior feed-forward methods on four benchmarks.
Surpasses per-scene optimization methods with post-optimization.
Runs orders of magnitude faster than existing approaches.
Abstract
Recent feed-forward 3D gaussian splatting methods have made dramatic progress on individual aspects of 3D scene reconstruction, but no existing method jointly addresses dynamic content, multi-view input, and unknown camera poses in a single feed-forward pass. Methods that handle dynamics either require accurate camera poses or accept only monocular input; pose-free multi-view methods address only static scenes; and per-scene optimization methods bridge some of these gaps but at minutes-to-hours cost per scene. We introduce NoPo4D, the first feed-forward system that addresses this empty quadrant. Building on a pretrained geometry backbone and recent 4D Gaussian frameworks, NoPo4D introduces a velocity decomposition that splits Gaussian motion into per-pixel image-plane shifts and depth changes, allowing direct supervision from pseudo ground-truth optical flow on the 2D component. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
