Target-aware Bi-Transformer for Few-shot Segmentation
Xianglin Wang, Xiaoliu Luo, Taiping Zhang

TL;DR
The paper introduces TBTNet, a lightweight and efficient few-shot segmentation model that uses a target-aware transformer to focus on foreground features, achieving fast convergence and strong performance on standard benchmarks.
Contribution
It proposes a novel Target-aware Bi-Transformer Network with a dedicated transformer layer, reducing complexity and training time for few-shot segmentation.
Findings
Achieves state-of-the-art results on PASCAL-5i and COCO-20i benchmarks.
Model is the lightest with only 0.4M parameters.
Converges in 10-25% of the training epochs of traditional methods.
Abstract
Traditional semantic segmentation tasks require a large number of labels and are difficult to identify unlearned categories. Few-shot semantic segmentation (FSS) aims to use limited labeled support images to identify the segmentation of new classes of objects, which is very practical in the real world. Previous researches were primarily based on prototypes or correlations. Due to colors, textures, and styles are similar in the same image, we argue that the query image can be regarded as its own support image. In this paper, we proposed the Target-aware Bi-Transformer Network (TBTNet) to equivalent treat of support images and query image. A vigorous Target-aware Transformer Layer (TTL) also be designed to distill correlations and force the model to focus on foreground information. It treats the hypercorrelation as a feature, resulting a significant reduction in the number of feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Focus · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Multi-Head Attention
