FocalClick-XL: Towards Unified and High-quality Interactive Segmentation

Xi Chen; Hengshuang Zhao

arXiv:2506.14686·cs.CV·June 18, 2025

FocalClick-XL: Towards Unified and High-quality Interactive Segmentation

Xi Chen, Hengshuang Zhao

PDF

Open Access

TL;DR

FocalClick-XL introduces a multi-stage, large-scale pretraining approach for interactive segmentation, enabling support for diverse interaction types and fine-grained mask predictions, achieving state-of-the-art results.

Contribution

It extends the classical FocalClick design with a novel pipeline that decomposes segmentation into meta-tasks, each pretrained independently, enhancing flexibility and performance.

Findings

01

State-of-the-art on click-based benchmarks

02

Supports diverse interaction formats including boxes and scribbles

03

Capable of predicting detailed alpha mattes

Abstract

Interactive segmentation enables users to extract binary masks of target objects through simple interactions such as clicks, scribbles, and boxes. However, existing methods often support only limited interaction forms and struggle to capture fine details. In this paper, we revisit the classical coarse-to-fine design of FocalClick and introduce significant extensions. Inspired by its multi-stage strategy, we propose a novel pipeline, FocalClick-XL, to address these challenges simultaneously. Following the emerging trend of large-scale pretraining, we decompose interactive segmentation into meta-tasks that capture different levels of information -- context, object, and detail -- assigning a dedicated subnet to each level.This decomposition allows each subnet to undergo scaled pretraining with independent data and supervision, maximizing its effectiveness. To enhance flexibility, we share…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques