S2GS: Streaming Semantic Gaussian Splatting for Online Scene Understanding and Reconstruction

Renhe Zhang; Yuyang Tan; Jingyu Gong; Zhizhong Zhang; Lizhuang Ma; Yuan Xie; Xin Tan

arXiv:2603.14232·cs.CV·March 17, 2026

S2GS: Streaming Semantic Gaussian Splatting for Online Scene Understanding and Reconstruction

Renhe Zhang, Yuyang Tan, Jingyu Gong, Zhizhong Zhang, Lizhuang Ma, Yuan Xie, Xin Tan

PDF

Open Access

TL;DR

S2GS introduces a scalable, causal framework for online 3D scene understanding and reconstruction that updates incrementally without reprocessing past data, outperforming offline methods in long sequence processing.

Contribution

It presents S2GS, a novel incremental 3D Gaussian semantic field framework that enables scalable online scene understanding and reconstruction without future frame reliance.

Findings

01

Matches or exceeds offline baselines on joint reconstruction and understanding.

02

Processes over 1,000 frames with minimal increase in runtime and memory.

03

Outperforms offline methods in long-horizon scalability.

Abstract

Existing offline feed-forward methods for joint scene understanding and reconstruction on long image streams often repeatedly perform global computation over an ever-growing set of past observations, causing runtime and GPU memory to increase rapidly with sequence length and limiting scalability. We propose Streaming Semantic Gaussian Splatting (S2GS), a strictly causal, incremental 3D Gaussian semantic field framework: it does not leverage future frames and continuously updates scene geometry, appearance, and instance-level semantics without reprocessing historical frames, enabling scalable online joint reconstruction and understanding. S2GS adopts a geometry-semantic decoupled dual-backbone design: the geometry branch performs causal modeling to drive incremental Gaussian updates, while the semantic branch leverages a 2D foundation vision model and a query-driven decoder to predict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques