Experts Weights Averaging: A New General Training Scheme for Vision   Transformers

Yongqi Huang; Peng Ye; Xiaoshui Huang; Sheng Li; Tao Chen; Tong He,; Wanli Ouyang

arXiv:2308.06093·cs.CV·August 28, 2023·1 cites

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He,, Wanli Ouyang

PDF

Open Access

TL;DR

This paper introduces a novel training scheme for Vision Transformers that uses Experts Weights Averaging with MoEs during training, improving performance without increasing inference costs and enabling effective fine-tuning.

Contribution

The paper proposes a new training strategy for ViTs that decouples training and inference, utilizing MoEs and Experts Weights Averaging to enhance performance without inference overhead.

Findings

01

Effective performance improvement across multiple visual tasks and datasets.

02

EWA significantly boosts naive MoE performance on small datasets.

03

Theoretical analysis explains the underlying mechanism of the training scheme.

Abstract

Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs), which achieves performance improvement without increasing inference cost. As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost? Recently, Mixture-of-Experts (MoE) has attracted increasing attention, as it can efficiently scale up the capacity of Transformers at a fixed cost through sparsely activated experts. Considering that MoE can also be viewed as a multi-branch structure, can we utilize MoE to implement a ViT training scheme similar to structural re-parameterization? In this paper, we affirmatively answer these questions, with a new general training strategy for ViTs. Specifically, we decouple the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques