SMMix: Self-Motivated Image Mixing for Vision Transformers
Mengzhao Chen, Mingbao Lin, ZhiHang Lin, Yuxin Zhang, Fei Chao,, Rongrong Ji

TL;DR
SMMix is a novel image augmentation technique for vision transformers that self-motivates image and label enhancement, improving accuracy with less training overhead compared to existing methods.
Contribution
It introduces a self-motivated image mixing approach with attention region mixing, fine-grained label assignment, and feature consistency constraints, reducing training overhead and enhancing performance.
Findings
Improves DeiT, CaiT, and PVT models by over 1% on ImageNet-1k.
Demonstrates better performance than other CutMix variants.
Shows strong generalization on downstream and out-of-distribution datasets.
Abstract
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or require extra information, undermining ease of use. To this end, we propose an novel and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. Specifically, we propose a max-min attention region mixing approach that enriches the attention-focused objects in the mixed images. Then, we introduce a fine-grained label assignment technique that co-trains the output tokens of mixed images with fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image Enhancement Techniques · Advanced Neural Network Applications
MethodsALIGN · CutMix
