Online Segment Any 3D Thing as Instance Tracking

Hanshi Wang; Zijian Cai; Jin Gao; Yiwei Zhang; Weiming Hu; Ke Wang; Zhipeng Zhang

arXiv:2512.07599·cs.CV·December 9, 2025

Online Segment Any 3D Thing as Instance Tracking

Hanshi Wang, Zijian Cai, Jin Gao, Yiwei Zhang, Weiming Hu, Ke Wang, Zhipeng Zhang

PDF

Open Access

TL;DR

This paper presents AutoSeg3D, a novel online 3D segmentation method that incorporates instance tracking and temporal information propagation to improve spatial understanding and robustness in embodied agents.

Contribution

It introduces a new approach that models online 3D segmentation as an instance tracking problem, utilizing object queries for temporal propagation and spatial consistency learning.

Findings

01

Surpasses ESAM by 2.8 AP on ScanNet200

02

Achieves consistent improvements on ScanNet, SceneNN, and 3RScan datasets

03

Enhances 3D segmentation with efficient temporal information exchange

Abstract

Online, real-time, and fine-grained 3D segmentation constitutes a fundamental capability for embodied intelligent agents to perceive and comprehend their operational environments. Recent advancements employ predefined object queries to aggregate semantic information from Vision Foundation Models (VFMs) outputs that are lifted into 3D point clouds, facilitating spatial information propagation through inter-query interactions. Nevertheless, perception is an inherently dynamic process, rendering temporal understanding a critical yet overlooked dimension within these prevailing query-based pipelines. Therefore, to further unlock the temporal environmental perception capabilities of embodied agents, our work reconceptualizes online 3D segmentation as an instance tracking problem (AutoSeg3D). Our core strategy involves utilizing object queries for temporal information propagation, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Generative Adversarial Networks and Image Synthesis