X-GS: An Extensible Open Framework for Perceiving and Thinking via 3D Gaussian Splatting
Yueen Ma, Zenglin Xu, Irwin King

TL;DR
X-GS is an open, extensible framework that unifies 3D Gaussian Splatting techniques for real-time spatial AI applications, enabling semantic understanding and multimodal integration.
Contribution
The paper introduces X-GS, a modular framework that integrates diverse 3DGS methods with novel mechanisms for real-time SLAM and semantic features, and interfaces with multimodal models.
Findings
Efficient real-time 3D scene understanding demonstrated on real-world datasets.
Enhanced multimodal capabilities including object detection and captioning.
Unified framework facilitates diverse spatial AI applications.
Abstract
3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods operate in isolation, focusing on specific domains such as pose-free 3DGS, online SLAM, and semantic enrichment. In this paper, we introduce X-GS, an extensible open framework consisting of two major components: the X-GS-Perceiver, which unifies a broad range of 3DGS techniques to enable real-time online SLAM and distill semantic features; and the X-GS-Thinker, which interfaces with downstream multimodal models. In our implementation of the Perceiver, we integrate various 3DGS methods through three novel mechanisms: an online Vector Quantization (VQ) module, a GPU-accelerated grid-sampling scheme, and a highly parallelized pipeline design. The Thinker accommodates vision-language models and utilizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Multimodal Machine Learning Applications
