Mixed-Query Transformer: A Unified Image Segmentation Architecture
Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha,, Stefano Soatto

TL;DR
The paper introduces MQ-Former, a unified image segmentation model that uses a mixed query strategy to handle multiple datasets and tasks with a single set of weights, improving generalization and performance.
Contribution
It proposes a novel mixed query strategy and a unified architecture for multi-task, multi-dataset image segmentation with a single set of weights.
Findings
Outperforms specialized models on multiple datasets.
Achieves over 7 points higher on open-vocabulary segmentation.
Effectively handles multiple tasks with a unified model.
Abstract
Existing unified image segmentation models either employ a unified architecture across multiple tasks but use separate weights tailored to each dataset, or apply a single set of weights to multiple datasets but are limited to a single task. In this paper, we introduce the Mixed-Query Transformer (MQ-Former), a unified architecture for multi-task and multi-dataset image segmentation using a single set of weights. To enable this, we propose a mixed query strategy, which can effectively and dynamically accommodate different types of objects without heuristic designs. In addition, the unified architecture allows us to use data augmentation with synthetic masks and captions to further improve model generalization. Experiments demonstrate that MQ-Former can not only effectively handle multiple segmentation datasets and tasks compared to specialized state-of-the-art models with competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Neural Networks and Applications · Image Retrieval and Classification Techniques
MethodsAttention Is All You Need · Sparse Evolutionary Training · Softmax · Linear Layer · Layer Normalization · Dense Connections · Label Smoothing · Residual Connection · Dropout · Multi-Head Attention
