An Efficient General-Purpose Modular Vision Model via Multi-Task   Heterogeneous Training

Zitian Chen; Mingyu Ding; Yikang Shen; Wei Zhan; Masayoshi Tomizuka,; Erik Learned-Miller; Chuang Gan

arXiv:2306.17165·cs.CV·June 30, 2023·1 cites

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka,, Erik Learned-Miller, Chuang Gan

PDF

Open Access

TL;DR

This paper introduces a scalable, multi-task vision transformer model trained on heterogeneous datasets, achieving high performance across diverse tasks with modularity for efficient adaptation and continual learning.

Contribution

It proposes a modified mixture-of-experts vision transformer capable of multi-task learning on diverse datasets, addressing heterogeneity challenges and enabling efficient downstream task adaptation.

Findings

01

Achieves comparable results to single-task models on multiple vision tasks.

02

Demonstrates strong generalization and modularity for downstream applications.

03

Enables efficient fine-tuning with fewer parameters and less computation.

Abstract

We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Despite considerable progress in multi-task learning, most efforts focus on learning from multi-label data: a single image set with multiple task labels. Such multi-label data sets are rare, small, and expensive. We say heterogeneous to refer to image sets with different task labels, or to combinations of single-task datasets. Few have explored training on such heterogeneous datasets. General-purpose vision models are still dominated by single-task pretraining, and it remains unclear how to scale up multi-task models by leveraging mainstream vision datasets designed for different purposes. The challenges lie in managing large intrinsic differences among vision tasks, including data distribution, architectures, task-specific modules, dataset scales, and sampling strategies.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsFocus