CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios

Jialei Xu; Zizhuang Wei; Weikang You; Linyun Li; Weijian Sun

arXiv:2508.09470·cs.CV·August 14, 2025

CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios

Jialei Xu, Zizhuang Wei, Weikang You, Linyun Li, Weijian Sun

PDF

TL;DR

CitySeg is a novel foundation model for city-scale 3D point cloud semantic segmentation that incorporates text modality, enabling open vocabulary and zero-shot inference, with state-of-the-art performance across multiple benchmarks.

Contribution

The paper introduces CitySeg, a city-scale point cloud segmentation model with hierarchical classification, cross-attention, and zero-shot capabilities, addressing data distribution and label discrepancies.

Findings

01

Achieves SOTA performance on nine benchmarks.

02

Enables zero-shot generalization in city-scale scenarios.

03

Outperforms existing approaches significantly.

Abstract

Semantic segmentation of city-scale point clouds is a critical technology for Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification of 3D points without relying on any visual information to achieve comprehensive 3D understanding. However, existing models are frequently constrained by the limited scale of 3D data and the domain gap between datasets, which lead to reduced generalization capability. To address these challenges, we propose CitySeg, a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. Specifically, in order to mitigate the issue of non-uniform data distribution across multiple domains, we customize the data preprocessing rules, and propose a local-global cross-attention network to enhance the perception capabilities of point networks in UAV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.