TL;DR
Stream3D introduces a training-free streaming mechanism for view-conditioned 3D generators, maintaining temporal consistency by dynamically updating an evidential memory with informative historical frames.
Contribution
It presents the first training-free streaming method that enhances temporal consistency in 3D generation without retraining or architectural changes.
Findings
Outperforms latent-transport baselines on realistic and synthetic benchmarks.
Maintains a fixed memory size, preventing degradation over long sequences.
Achieves better photometric and geometric metrics than existing methods.
Abstract
View-conditioned 3D generators such as SAM 3D, TRELLIS and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
