Auxiliary task demands mask the capabilities of smaller language models

Jennifer Hu; Michael C. Frank

arXiv:2404.02418·cs.CL·July 31, 2024·6 cites

Auxiliary task demands mask the capabilities of smaller language models

Jennifer Hu, Michael C. Frank

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that the perceived capabilities of smaller language models are significantly affected by the task demands of evaluation methods, which can obscure their true underlying abilities.

Contribution

It reveals how evaluation task demands influence language model performance, highlighting the importance of considering these demands when assessing model capabilities.

Findings

01

Higher task demands lower model performance, especially for smaller models.

02

Performance differences are more pronounced in models with fewer parameters.

03

Evaluation methods with reduced demands better reflect models' underlying knowledge.

Abstract

Developmental psychologists have argued about when cognitive capacities such as language understanding or theory of mind emerge. These debates often hinge on the concept of "task demands" -- the auxiliary challenges associated with performing a particular evaluation -- that may mask the child's underlying ability. The same issues arise when measuring the capacities of language models (LMs): performance on a task is a function of the model's underlying knowledge, combined with the model's ability to interpret and perform the task given its available resources. Here, we show that for analogical reasoning, reflective reasoning, word prediction, and grammaticality judgments, evaluation methods with greater task demands yield lower performance than evaluations with reduced demands. This "demand gap" is most pronounced for models with fewer parameters and less training data. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jennhu/lm-task-demands
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling