Learning from Exemplars for Interactive Image Segmentation

Kun Li; Hao Cheng; George Vosselman; and Michael Ying Yang

arXiv:2406.11472·cs.CV·June 18, 2024

Learning from Exemplars for Interactive Image Segmentation

Kun Li, Hao Cheng, George Vosselman, and Michael Ying Yang

PDF

Open Access

TL;DR

This paper introduces novel transformer-based interactive image segmentation models that leverage exemplar information to efficiently segment multiple objects of the same category, reducing user effort and improving accuracy.

Contribution

The paper proposes new frameworks for interactive segmentation that utilize exemplar-informed modules and cross-attention to enhance multi-object segmentation performance.

Findings

01

Achieves superior performance on benchmark datasets.

02

Reduces user clicks by around 15%.

03

Requires two fewer clicks to reach high IoU thresholds.

Abstract

Interactive image segmentation enables users to interact minimally with a machine, facilitating the gradual refinement of the segmentation mask for a target of interest. Previous studies have demonstrated impressive performance in extracting a single target mask through interactive segmentation. However, the information cues of previously interacted objects have been overlooked in the existing methods, which can be further explored to speed up interactive segmentation for multiple targets in the same category. To this end, we introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category. Specifically, our model leverages transformer backbones to extract interaction-focused visual features from the image and the interactions to obtain a satisfactory mask of a target as an exemplar. For multiple objects, we propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings