ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings
Xueyang Kang, Zijian Yu, Kourosh Khoshelham, and Liangliang Nan

TL;DR
ClickSeg3D introduces a novel 3D interactive segmentation method that processes multiple clicks simultaneously on sparse point clouds, significantly improving accuracy and efficiency for real-time applications.
Contribution
It proposes a point Transformer-based framework that jointly reasons over all clicks and integrates semantic embeddings, addressing limitations of prior sequential and 2D-dependent methods.
Findings
Improves mIoU by over 20% compared to strong baselines.
Achieves 8-10% gains under cross-dataset evaluation with one click per object.
Requires only a single click per object for effective segmentation.
Abstract
Interactive segmentation allows efficient label generation by leveraging user-provided clicks to progressively refine predictions, which is critical when fully supervised labels are costly or generalization to unseen classes is needed. Existing 3D interactive methods are limited: most operate sequentially, predicting only one object per iteration with binary masks, while several recent approaches depend on 2D foundation models and camera alignment to bridge the 2D-3D gap. To address these limitations, we propose a novel interactive segmentation framework that operates directly on sparse, randomly downsampled 3D points and processes multiple object clicks in a single forward pass. Our framework consists of a point Transformer-based encoder and a hierarchical mask decoder, which integrates multi-level crop-and-merge operations conditioned on learnable semantic embeddings. Unlike prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
