Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation

Xuan Yu; Yuxuan Xie; Shichao Zhai; Shuhao Ye; Rong Xiong; Yue Wang

arXiv:2604.10951·cs.RO·April 14, 2026

Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation

Xuan Yu, Yuxuan Xie, Shichao Zhai, Shuhao Ye, Rong Xiong, Yue Wang

PDF

TL;DR

Fast-SegSim is a real-time, open-vocabulary 3D segmentation framework built on 2D Gaussian Splatting, optimized for robotics, achieving over 40 FPS and improving perception tasks in simulation-to-real transfer.

Contribution

It introduces a highly optimized rendering pipeline with novel strategies to enable real-time, high-fidelity 3D segmentation for robotics applications.

Findings

01

Achieves rendering rates exceeding 40 FPS.

02

Provides multi-view ground truth labels for perception fine-tuning.

03

Doubles success rate in object goal navigation after fine-tuning.

Abstract

Open-vocabulary panoptic reconstruction is crucial for advanced robotics and simulation. However, existing 3D reconstruction methods, such as NeRF or Gaussian Splatting variants, often struggle to achieve the real-time inference frequency required by robotic control loops. Existing methods incur prohibitive latency when processing the high-dimensional features required for robust open-vocabulary segmentation. We propose Fast-SegSim, a novel, simple, and end-to-end framework built upon 2D Gaussian Splatting, designed to realize real-time, high-fidelity, and 3D-consistent open-vocabulary segmentation reconstruction. Our core contribution is a highly optimized rendering pipeline that specifically addresses the computational bottleneck of high-channel segmentation feature accumulation. We introduce two key optimizations: Precise Tile Intersection to reduce rasterization redundancy, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.