Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language   Models

Jingjing Xu; Qingxiu Dong; Hongyi Liu; Lei Li

arXiv:2212.10461·cs.CL·December 21, 2022·1 cites

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models

Jingjing Xu, Qingxiu Dong, Hongyi Liu, Lei Li

PDF

Open Access

TL;DR

Go-tuning enhances small language models' zero-shot learning capabilities through geometry-guided self-supervised training, achieving results comparable to much larger models without external data or extensive prompt engineering.

Contribution

This work introduces Go-tuning, a novel self-supervised method that significantly improves zero-shot performance of small language models without external supervised data.

Findings

01

T5-small with Go-tuning achieves competitive zero-shot results to T5-XL.

02

Multi-task mgo-T5 reaches average performance of OPT-175B on 9 datasets.

03

Go-tuning reduces reliance on prompt engineering and large models.

Abstract

With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · OPT · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Dropout · Linear Layer · Layer Normalization