SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding

Sheng Ye; Zhen-Hui Dong; Ruoyu Fan; Tian Lv; Yong-Jin Liu

arXiv:2603.02548·cs.CV·March 4, 2026

SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding

Sheng Ye, Zhen-Hui Dong, Ruoyu Fan, Tian Lv, Yong-Jin Liu

PDF

Open Access

TL;DR

SemGS is a fast, generalizable framework that reconstructs semantic 3D scenes from sparse views using a dual-branch architecture, camera-aware attention, and Gaussian-based decoding, outperforming existing methods.

Contribution

We introduce SemGS, a novel feed-forward approach for semantic 3D scene understanding from sparse views, with a dual-branch architecture and camera-aware attention for improved generalization.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Provides rapid inference suitable for real-world applications.

03

Demonstrates strong generalization across synthetic and real-world scenes.

Abstract

Semantic understanding of 3D scenes is essential for robots to operate effectively and safely in complex environments. Existing methods for semantic scene reconstruction and semantic-aware novel view synthesis often rely on dense multi-view inputs and require scene-specific optimization, limiting their practicality and scalability in real-world applications. To address these challenges, we propose SemGS, a feed-forward framework for reconstructing generalizable semantic fields from sparse image inputs. SemGS uses a dual-branch architecture to extract color and semantic features, where the two branches share shallow CNN layers, allowing semantic reasoning to leverage textural and structural cues in color appearance. We also incorporate a camera-aware attention mechanism into the feature extractor to explicitly model geometric relationships between camera viewpoints. The extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization