It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick, Hinrich Sch\"utze

TL;DR
This paper demonstrates that small language models can achieve few-shot learning performance comparable to large models like GPT-3 by using task-specific prompts, gradient optimization, and unlabeled data, offering a more sustainable alternative.
Contribution
It introduces methods for small language models to perform few-shot learning effectively, reducing reliance on massive models and computational resources.
Findings
Small models can match GPT-3's performance with proper prompting.
Prompt-based conversion of inputs enhances learning efficiency.
Unlabeled data further improves small model capabilities.
Abstract
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much "greener" in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Cosine Annealing · Dropout · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need
