SPT: Sequence Prompt Transformer for Interactive Image Segmentation
Senlin Cheng, Haopeng Sun

TL;DR
The paper introduces Sequence Prompt Transformer (SPT), a novel model that leverages sequential image information and user prompts to improve interactive image segmentation, outperforming existing methods.
Contribution
It is the first to utilize sequential image data in interactive segmentation and introduces the ADE20K-Seq benchmark for evaluation.
Findings
SPT surpasses state-of-the-art methods on multiple datasets.
The Top-k Prompt Selection enhances segmentation accuracy.
Sequential information improves segmentation consistency.
Abstract
Interactive segmentation aims to extract objects of interest from an image based on user-provided clicks. In real-world applications, there is often a need to segment a series of images featuring the same target object. However, existing methods typically process one image at a time, failing to consider the sequential nature of the images. To overcome this limitation, we propose a novel method called Sequence Prompt Transformer (SPT), the first to utilize sequential image information for interactive segmentation. Our model comprises two key components: (1) Sequence Prompt Transformer (SPT) for acquiring information from sequence of images, clicks and masks to improve accurate. (2) Top-k Prompt Selection (TPS) selects precise prompts for SPT to further enhance the segmentation effect. Additionally, we create the ADE20K-Seq benchmark to better evaluate model performance. We evaluate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Byte Pair Encoding · Residual Connection
