A Few More Examples May Be Worth Billions of Parameters

Yuval Kirstain; Patrick Lewis; Sebastian Riedel; Omer Levy

arXiv:2110.04374·cs.CL·October 12, 2021

A Few More Examples May Be Worth Billions of Parameters

Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how increasing model size and data quantity affects performance across different NLP tasks, finding that data benefits vary by task type, with some tasks needing more data and others not.

Contribution

It demonstrates that the value of additional training data depends on task format, highlighting that some tasks benefit more from data than from larger models.

Findings

01

Scaling parameters improves performance across tasks.

02

Additional data benefits classification, extractive QA, multiple choice tasks.

03

Open question answering does not benefit significantly from more data.

Abstract

We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters. We hypothesize that unlike open question answering, which involves recalling specific information, solving strategies for tasks with a more restricted output space transfer across examples, and can therefore be learned with small amounts of labeled data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuvalkirstain/lm-evaluation-harness
noneOfficial

Videos

[ML News] Microsoft trains 530B model | ConvMixer model fits into single tweet | DeepMind profitable· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems