Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation
Xuan Yu, Yuxuan Xie, Shichao Zhai, Shuhao Ye, Rong Xiong, Yue Wang

TL;DR
Fast-SegSim is a real-time, open-vocabulary 3D segmentation framework built on 2D Gaussian Splatting, optimized for robotics, achieving over 40 FPS and improving perception tasks in simulation-to-real transfer.
Contribution
It introduces a highly optimized rendering pipeline with novel strategies to enable real-time, high-fidelity 3D segmentation for robotics applications.
Findings
Achieves rendering rates exceeding 40 FPS.
Provides multi-view ground truth labels for perception fine-tuning.
Doubles success rate in object goal navigation after fine-tuning.
Abstract
Open-vocabulary panoptic reconstruction is crucial for advanced robotics and simulation. However, existing 3D reconstruction methods, such as NeRF or Gaussian Splatting variants, often struggle to achieve the real-time inference frequency required by robotic control loops. Existing methods incur prohibitive latency when processing the high-dimensional features required for robust open-vocabulary segmentation. We propose Fast-SegSim, a novel, simple, and end-to-end framework built upon 2D Gaussian Splatting, designed to realize real-time, high-fidelity, and 3D-consistent open-vocabulary segmentation reconstruction. Our core contribution is a highly optimized rendering pipeline that specifically addresses the computational bottleneck of high-channel segmentation feature accumulation. We introduce two key optimizations: Precise Tile Intersection to reduce rasterization redundancy, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
