Structured Click Control in Transformer-based Interactive Segmentation

Long Xu; Yongquan Chen; Rui Huang; Feng Wu; Shiwu Lai

arXiv:2405.04009·cs.CV·May 8, 2024

Structured Click Control in Transformer-based Interactive Segmentation

Long Xu, Yongquan Chen, Rui Huang, Feng Wu, Shiwu Lai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a structured click control method using graph neural networks and dual cross-attention to improve the robustness and precision of Transformer-based interactive segmentation after multiple user clicks.

Contribution

It proposes a novel structured click intent model that adaptively captures user click patterns and enhances segmentation control in Transformer architectures.

Findings

01

Improved segmentation robustness after multiple clicks

02

Enhanced control over segmentation results

03

Generalizable structure for Transformer-based interactive segmentation

Abstract

Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on graph neural networks, which adaptively obtains graph nodes via the global similarity of user-clicked Transformer tokens. Then the graph nodes will be aggregated to obtain structured interaction features. Finally, the dual cross-attention will be used to inject structured interaction features into vision Transformer features, thereby enhancing the control of clicks over segmentation results. Extensive experiments demonstrated the proposed algorithm can serve as a general structure in improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hahamyt/scc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Advanced Vision and Imaging

MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Vision Transformer · Linear Layer · Byte Pair Encoding