MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers
Jongseong Bae, Susang Kim, Minsu Cho, Ha Young Kim

TL;DR
MVFormer introduces multi-view normalization and token mixing to diversify feature learning, significantly improving efficiency and accuracy of vision transformers across multiple vision tasks.
Contribution
The paper proposes MVN and MVTM components integrated into a new ViT model, MVFormer, enhancing feature diversity and multi-scale token interaction for better performance.
Findings
Outperforms state-of-the-art convolution-based ViTs on multiple vision tasks.
Achieves high accuracy on ImageNet-1K with fewer parameters and MACs.
Demonstrates the effectiveness of diversified normalization and token mixing strategies.
Abstract
Active research is currently underway to enhance the efficiency of vision transformers (ViTs). Most studies have focused solely on effective token mixers, overlooking the potential relationship with normalization. To boost diverse feature learning, we propose two components: a normalization module called multi-view normalization (MVN) and a token mixer called multi-view token mixer (MVTM). The MVN integrates three differently normalized features via batch, layer, and instance normalization using a learnable weighted sum. Each normalization method outputs a different distribution, generating distinct features. Thus, the MVN is expected to offer diverse pattern information to the token mixer, resulting in beneficial synergy. The MVTM is a convolution-based multiscale token mixer with local, intermediate, and global filters, and it incorporates stage specificity by configuring various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors
MethodsMetaFormer · Instance Normalization
