3rd Place Scheme on Instance Segmentation Track of ICCV 2021 VIPriors Challenges
Pengyu Chen, Wanhua Li

TL;DR
This paper presents a data-efficient instance segmentation approach based on a modified Swin Transformer, achieving competitive results in the ICCV 2021 VIPriors Challenge using only a single GPU.
Contribution
The authors developed a modified Swin Transformer method with data augmentation and multiscale fusion, demonstrating high performance with minimal hardware.
Findings
Achieved [email protected]:0.95 of 0.366 on test set
Ranked second in [email protected]:0.95 (medium) among contestants
Used only one GPU for training and testing
Abstract
In this paper, we introduce a data-efficient instance segmentation method we used in the 2021 VIPriors Instance Segmentation Challenge. Our solution is a modified version of Swin Transformer, based on the mmdetection which is a powerful toolbox. To solve the problem of lack of data, we utilize data augmentation including random flip and multiscale training to train our model. During inference, multiscale fusion is used to boost the performance. We only use a single GPU during the whole training and testing stages. In the end, our team achieved the result of 0.366 for [email protected]:0.95 on the test set, which is competitive with other top-ranking methods while only one GPU is used. Besides, our method achieved the [email protected]:0.95 (medium) of 0.592, which ranks second among all contestants. In the end, our team ranked third among all the contestants, as announced by the organizers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
MethodsAttention Is All You Need · FLIP · Test · Linear Layer · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Byte Pair Encoding
