Transforming Vision Transformer: Towards Efficient Multi-Task   Asynchronous Learning

Hanwen Zhong; Jiaxin Chen; Yutong Zhang; Di Huang; Yunhong Wang

arXiv:2501.06884·cs.CV·January 14, 2025

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning

Hanwen Zhong, Jiaxin Chen, Yutong Zhang, Di Huang, Yunhong Wang

PDF

1 Repo

TL;DR

This paper introduces EMTAL, a novel method for multi-task learning with Vision Transformers that improves efficiency and performance by transforming pre-trained models and optimizing asynchronous learning processes.

Contribution

The paper proposes a new framework called EMTAL that transforms pre-trained Vision Transformers into efficient multi-task learners with reparameterization and asynchronous optimization.

Findings

01

Outperforms state-of-the-art multi-task learning methods on benchmarks.

02

Achieves higher inference speed without sacrificing accuracy.

03

Effectively maintains task performance during asynchronous training.

Abstract

Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. However, their rigid combination hampers both the optimization of MoE and the ef fectiveness of reparameterization of LoRA, leading to sub-optimal performance and low inference speed. In this work, we propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner during training, and reparameterizing the learned structure for efficient inference. Specifically, we firstly develop the MoEfied LoRA structure, which decomposes the pre-trained Transformer into a low-rank MoE structure and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yewen1486/emtal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Vision Transformer · Multi-Head Attention