A Universal Discriminator for Zero-Shot Generalization
Haike Xu, Zongyu Lin, Jing Zhou, Yanan Zheng, Zhilin Yang

TL;DR
This paper demonstrates that discriminative models, trained as universal discriminators, outperform generative models in zero-shot NLP tasks, achieving state-of-the-art results with fewer parameters and minimal prompting.
Contribution
It introduces a simple discriminative approach trained as a universal discriminator that surpasses generative models in zero-shot and fine-tuning NLP tasks, with improved robustness and efficiency.
Findings
State-of-the-art zero-shot results on T0 benchmark
Outperforms T0 by 16.0%, 7.8%, and 11.5% on different scales
Achieves new SOTA with only 1/4 parameters of previous methods
Abstract
Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization. In this work, we challenge this convention by showing that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks. Technically, we train a single discriminator to predict whether a text sample comes from the true data distribution, similar to GANs. Since many NLP tasks can be formulated as selecting from a few options, we use this discriminator to predict the concatenation of input and which option has the highest probability of coming from the true data distribution. This simple formulation achieves state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by 16.0\%, 7.8\%, and 11.5\% respectively on different scales. In the finetuning setting, our approach also achieves new state-of-the-art results on a wide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
