DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer
Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, and Bastian, Leibe

TL;DR
DynaMITe introduces a Transformer-based interactive segmentation method that efficiently segments multiple objects simultaneously with fewer interactions, reducing computational costs and achieving state-of-the-art results.
Contribution
It proposes a novel spatio-temporal query approach in a Transformer decoder for multi-object interactive segmentation, enabling single-iteration multi-instance processing.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Reduces number of interactions needed for segmentation.
Supports multi-instance segmentation in a single image.
Abstract
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections
