GPPF: A General Perception Pre-training Framework via Sparsely Activated   Multi-Task Learning

Benyuan Sun; Jin Dai; Zihao Liang; Congying Liu; Yi Yang; Bo Bai

arXiv:2208.02148·cs.CV·August 5, 2022·1 cites

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai

PDF

Open Access

TL;DR

GPPF introduces a dynamic, multi-task pre-training framework that leverages sparse activation and knowledge sharing to improve performance across diverse vision tasks and models.

Contribution

The paper proposes GPPF, a novel multi-task pre-training framework with a task-level dynamic network and a plug-and-play multi-task training algorithm supporting concurrent training.

Findings

01

GPPF-R50 outperforms baseline by 2.5-5.8 on 8 pre-training tasks.

02

Achieves state-of-the-art results on 22 downstream vision tasks.

03

Generalizes effectively to vision transformers with consistent improvements.

Abstract

Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Visual Attention and Saliency Detection