StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Zike Wu; Qi Yan; Xuanyu Yi; Lele Wang; Renjie Liao

arXiv:2506.08862·cs.CV·March 4, 2026

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao

PDF

Open Access 1 Repo 3 Reviews

TL;DR

StreamSplat introduces a fast, online framework for real-time 3D scene reconstruction from uncalibrated video streams, outperforming traditional optimization-based methods in speed and quality.

Contribution

It presents a novel feed-forward approach with probabilistic sampling, bidirectional deformation, and adaptive Gaussian fusion for online dynamic 3D reconstruction.

Findings

01

Achieves state-of-the-art quality on standard benchmarks.

02

Supports arbitrarily long video streams with 1200x speedup.

03

Outperforms optimization-based methods in speed and accuracy.

Abstract

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams demands robust online methods that recover scene dynamics from sparse observations under strict latency and memory constraints. Yet most dynamic reconstruction methods rely on hours of per-scene optimization under full-sequence access, limiting practical deployment. In this work, we introduce StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. It is achieved via three key technical innovations: 1) a probabilistic sampling mechanism that robustly predicts 3D Gaussians from uncalibrated inputs; 2) a bidirectional deformation field that yields reliable associations across frames and mitigates long-term error accumulation; 3) an adaptive Gaussian fusion operation that…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

Strength: 1. The paper is well-written and easy to follow. 2. StreamSplat is feedforward and increase the speed of reconstruction. 3. In the figure 4, StreamSpalt shows persistent gaussians across frames, which shows the potential of long-term modeling.

Weaknesses

Major Weakness: 1. Lack of Rigorous Evaluation Protocols: The evaluation does not follow established dynamic reconstruction benchmarks such as DyCheck or NVIDIA Dynamic Scene Dataset. The chosen datasets (DAVIS and YouTube-VOS) are more typical for video segmentation or interpolation, not 4D reconstruction. 2. Limited Training Dataset: The paper uses a mix of static (CO3Dv2, RealEstate10K) and limited dynamic (DAVIS, YouTube-VOS) datasets for training. However, DAVIS contains only a few short

Reviewer 02Rating 8Confidence 2

Strengths

* The paper tackles an underexplored yet practically important problem: real-time dynamic 3D reconstruction from uncalibrated video streams, which existing 3DGS and NeRF-based methods generally overlook due to their offline and per-scene optimization nature. * Unlike prior optimization-based dynamic 3DGS methods, StreamSplat introduces a fully feed-forward pipeline that supports online inference without requiring camera calibration or pre-computed poses, making it highly suitable for real-world

Weaknesses

* It would be helpful if the authors could clarify whether their framework is capable of predicting or estimating camera poses, given that StreamSplat operates under uncalibrated input conditions. If not, discussing potential extensions in this direction would strengthen the paper's completeness. * The paper would benefit from additional discussion or experiments on highly dynamic scenes with significant topological changes (e.g., frequent object entries and exits from the field of view). It re

Reviewer 03Rating 8Confidence 3

Strengths

- **Impressive empirical results:** The method achieves substantial gains over prior 3DGS and NeRF-based approaches, setting a new benchmark for *online dynamic reconstruction from uncalibrated video streams*. - **Reasonable and clear pipeline:** The proposed two-stage static/dynamic training scheme is well-motivated and reproducible. The combination of a strong image encoder and a dynamic decoder makes architectural sense. - **Excellent efficiency:** StreamSplat attains orders-of-magnitude

Weaknesses

Major Weaknesses **W1.** The formulation in *line 157* appears problematic: $(u,v)$ represents pixel coordinates while the offset $o_i$ is in unit space. Their direct addition may be incorrect if the coordinate system is rectilinear. A clarification of this coordinate transformation is needed. **W2.** Algorithm 2’s *aggregation and fusion* step is ambiguous. The operation `UPDATE` in line 228 is not clearly defined—does it simply replace $\tilde{\mathcal{G}}$ with$ \mathcal{G}_{k-1}^+ $, or

Code & Models

Repositories

nickwzk/streamsplat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Human Pose and Action Recognition