EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding
Seungjun Lee, Zihan Wang, Yunsong Wang, Gim Hee Lee

TL;DR
EmbodiedSplat is an online, feed-forward 3D scene understanding method that reconstructs and semantically labels 3D environments in real-time from streaming images, generalizing well to new scenes.
Contribution
It introduces a novel online 3D semantic scene reconstruction approach using CLIP embeddings and 3D geometric features, enabling real-time, open-vocabulary understanding of 3D scenes.
Findings
Reconstructs semantic 3D scenes from over 300 streaming images in real-time.
Achieves high generalization to unseen scenes with a feed-forward design.
Demonstrates effectiveness on diverse indoor datasets.
Abstract
Understanding a 3D scene immediately with its exploration is essential for embodied tasks, where an agent must construct and comprehend the 3D scene in an online and nearly real-time manner. In this study, we propose EmbodiedSplat, an online feed-forward 3DGS for open-vocabulary scene understanding that enables simultaneous online 3D reconstruction and 3D semantic understanding from the streaming images. Unlike existing open-vocabulary 3DGS methods which are typically restricted to either offline or per-scene optimization setting, our objectives are two-fold: 1) Reconstructs the semantic-embedded 3DGS of the entire scene from over 300 streaming images in an online manner. 2) Highly generalizable to novel scenes with feed-forward design and supports nearly real-time 3D semantic reconstruction when combined with real-time 2D models. To achieve these objectives, we propose an Online Sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
