ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings

Xueyang Kang; Zijian Yu; Kourosh Khoshelham; and Liangliang Nan

arXiv:2605.08925·cs.CV·May 19, 2026

ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings

Xueyang Kang, Zijian Yu, Kourosh Khoshelham, and Liangliang Nan

PDF

TL;DR

ClickSeg3D introduces a novel 3D interactive segmentation method that processes multiple clicks simultaneously on sparse point clouds, significantly improving accuracy and efficiency for real-time applications.

Contribution

It proposes a point Transformer-based framework that jointly reasons over all clicks and integrates semantic embeddings, addressing limitations of prior sequential and 2D-dependent methods.

Findings

01

Improves mIoU by over 20% compared to strong baselines.

02

Achieves 8-10% gains under cross-dataset evaluation with one click per object.

03

Requires only a single click per object for effective segmentation.

Abstract

Interactive segmentation allows efficient label generation by leveraging user-provided clicks to progressively refine predictions, which is critical when fully supervised labels are costly or generalization to unseen classes is needed. Existing 3D interactive methods are limited: most operate sequentially, predicting only one object per iteration with binary masks, while several recent approaches depend on 2D foundation models and camera alignment to bridge the 2D-3D gap. To address these limitations, we propose a novel interactive segmentation framework that operates directly on sparse, randomly downsampled 3D points and processes multiple object clicks in a single forward pass. Our framework consists of a point Transformer-based encoder and a hierarchical mask decoder, which integrates multi-level crop-and-merge operations conditioned on learnable semantic embeddings. Unlike prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.