Mask Matching Transformer for Few-Shot Segmentation
Siyu Jiao, Gengwei Zhang, Shant Navasardyan, Ling Chen, Yao Zhao,, Yunchao Wei, Humphrey Shi

TL;DR
This paper introduces MM-Former, a novel few-shot segmentation approach that decomposes query images into proposals and matches them with support images, improving flexibility and generalization over traditional prototypical methods.
Contribution
The paper proposes a new paradigm for few-shot segmentation using a class-agnostic segmenter and a simple matching mechanism, reducing complexity and enhancing adaptability.
Findings
Achieves competitive results on COCO-20i and Pascal-5i benchmarks.
Demonstrates improved generalization in complex scenarios.
Validates effectiveness of the proposal-based segmentation approach.
Abstract
In this paper, we aim to tackle the challenging few-shot segmentation task from a new perspective. Typical methods follow the paradigm to firstly learn prototypical features from support images and then match query features in pixel-level to obtain segmentation results. However, to obtain satisfactory segments, such a paradigm needs to couple the learning of the matching operations with heavy segmentation modules, limiting the flexibility of design and increasing the learning complexity. To alleviate this issue, we propose Mask Matching Transformer (MM-Former), a new paradigm for the few-shot segmentation task. Specifically, MM-Former first uses a class-agnostic segmenter to decompose the query image into multiple segment proposals. Then, a simple matching mechanism is applied to merge the related segment proposals into the final mask guided by the support images. The advantages of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Dropout
