Semi-Supervised Vision Transformers

Zejia Weng; Xitong Yang; Ang Li; Zuxuan Wu; Yu-Gang Jiang

arXiv:2111.11067·cs.CV·July 19, 2022·1 cites

Semi-Supervised Vision Transformers

Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Semiformer, a semi-supervised learning framework combining transformers and CNNs, which significantly improves Vision Transformer performance on limited labeled data, achieving state-of-the-art results on ImageNet.

Contribution

The paper proposes Semiformer, a novel semi-supervised framework that integrates transformer and convolutional streams with a fusion module, enhancing Vision Transformer training with limited labeled data.

Findings

01

Semiformer achieves 75.5% top-1 accuracy on ImageNet.

02

It outperforms existing methods in semi-supervised vision tasks.

03

The framework is compatible with various transformer and CNN architectures.

Abstract

We study the training of Vision Transformers for semi-supervised image classification. Transformers have recently demonstrated impressive performance on a multitude of supervised learning tasks. Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available. Inspired by this observation, we introduce a joint semi-supervised learning framework, Semiformer, which contains a transformer stream, a convolutional stream and a carefully designed fusion module for knowledge sharing between these streams. The convolutional stream is trained on limited labeled data and further used to generate pseudo labels to supervise the training of the transformer stream on unlabeled data. Extensive experiments on ImageNet demonstrate that Semiformer achieves 75.5% top-1 accuracy, outperforming the state-of-the-art by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wengzejia1/semiformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Adam · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Residual Connection · Dense Connections