Vision Transformer Adapters for Generalizable Multitask Learning

Deblina Bhattacharjee; Sabine S\"usstrunk; Mathieu Salzmann

arXiv:2308.12372·cs.CV·August 25, 2023

Vision Transformer Adapters for Generalizable Multitask Learning

Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann

PDF

Open Access 1 Video

TL;DR

This paper presents a multitasking vision transformer adapter framework that efficiently learns generalizable task affinities, enabling zero-shot transfer, domain adaptation, and multi-task learning without retraining for new tasks or domains.

Contribution

It introduces a novel task-adapted attention mechanism within vision transformer adapters that generalizes to unseen tasks and domains without retraining.

Findings

01

Outperforms existing CNN and transformer-based multitasking methods.

02

Enables zero-shot task transfer and domain adaptation.

03

Parameter-efficient and does not require retraining for new tasks.

Abstract

We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Vision Transformer Adapters for Generalizable Multitask Learning· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Residual Connection · Softmax · Dense Connections · Vision Transformer · Adapter