Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models
Jingjing Xu, Qingxiu Dong, Hongyi Liu, Lei Li

TL;DR
Go-tuning enhances small language models' zero-shot learning capabilities through geometry-guided self-supervised training, achieving results comparable to much larger models without external data or extensive prompt engineering.
Contribution
This work introduces Go-tuning, a novel self-supervised method that significantly improves zero-shot performance of small language models without external supervised data.
Findings
T5-small with Go-tuning achieves competitive zero-shot results to T5-XL.
Multi-task mgo-T5 reaches average performance of OPT-175B on 9 datasets.
Go-tuning reduces reliance on prompt engineering and large models.
Abstract
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · OPT · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Dropout · Linear Layer · Layer Normalization
