mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Ling-Yu Duan

TL;DR
mc-BEiT introduces a multi-choice discretization approach for image BERT pre-training, improving the quality of masked image modeling by using soft supervision and inter-patch perception, leading to superior performance across vision tasks.
Contribution
It proposes a novel multi-choice training objective for masked image modeling that refines discretization using soft probabilities and inter-patch relations, enhancing image pre-training.
Findings
Achieves 84.1% top-1 accuracy on ImageNet-1K.
Outperforms counterparts in object detection and segmentation on COCO.
Excels in semantic segmentation on ADE20K.
Abstract
Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation learning. A seminal work, BEiT, casts MIM as a classification task with a visual vocabulary, tokenizing the continuous visual signals into discrete vision tokens using a pre-learned dVAE. Despite a feasible solution, the improper discretization hinders further improvements of image pre-training. Since image discretization has no ground-truth answers, we believe that the masked patch should not be assigned with a unique token id even if a better tokenizer can be obtained. In this work, we introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives. Specifically, the multi-choice supervision for the masked image patches is formed by the soft probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Dropout · Weight Decay · Dense Connections · Attention Dropout · Multi-Head Attention
