ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves   Zero-Shot Generalization

Hanwei Xu; Yujun Chen; Yulun Du; Nan Shao; Yanggang Wang; Haiyu Li,; Zhilin Yang

arXiv:2201.06910·cs.LG·November 1, 2022

ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li,, Zhilin Yang

PDF

Open Access 3 Models

TL;DR

ZeroPrompt demonstrates that scaling to 1,000 tasks significantly enhances zero-shot generalization and training efficiency, with minimal dependence on model size, through a novel multitask pretraining and prompt search approach.

Contribution

The paper introduces ZeroPrompt, a multitask pretraining method on 1,000 tasks, showing task scaling as an efficient alternative to model scaling and incorporating a genetic algorithm for prompt optimization.

Findings

01

Task scaling improves training efficiency by 30 times in FLOPs.

02

ZeroPrompt enhances zero-shot performance across diverse datasets.

03

Model size has little impact when scaling to many tasks.

Abstract

We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impact on performance with an extremely large number of tasks. Our results show that task scaling can substantially improve training efficiency by 30 times in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements. Empirically, ZeroPrompt substantially improves both the efficiency and the performance of zero-shot learning across a variety of academic and production…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Multimodal Machine Learning Applications