GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning
Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai

TL;DR
GPPF introduces a dynamic, multi-task pre-training framework that leverages sparse activation and knowledge sharing to improve performance across diverse vision tasks and models.
Contribution
The paper proposes GPPF, a novel multi-task pre-training framework with a task-level dynamic network and a plug-and-play multi-task training algorithm supporting concurrent training.
Findings
GPPF-R50 outperforms baseline by 2.5-5.8 on 8 pre-training tasks.
Achieves state-of-the-art results on 22 downstream vision tasks.
Generalizes effectively to vision transformers with consistent improvements.
Abstract
Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training. In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and multi-domain datasets. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks: (1) simultaneous exposure to diverse cross-task and cross-domain information in each batch. (2) partitioned knowledge storage in separate lego units driven by knowledge sharing. (3) sparse activation of a subset of lego units for both pre-training and downstream tasks. Noteworthy, the joint training of disparate vision tasks is non-trivial due to their differences in input shapes, loss functions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Visual Attention and Saliency Detection
