Evaluating the fairness of task-adaptive pretraining on unlabeled test   data before few-shot text classification

Kush Dubey

arXiv:2410.00179·cs.CL·October 3, 2024

Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

Kush Dubey

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether pretraining on unlabeled test data biases few-shot NLP benchmarks and finds no evidence of overoptimism, emphasizing the importance of multiple training folds for reliable evaluation.

Contribution

It provides an empirical analysis of the potential bias from test data pretraining in few-shot NLP benchmarks and offers methodological recommendations.

Findings

01

No evidence of overoptimism from test data pretraining

02

Recommends multiple training folds for robust evaluation

03

Highlights importance of repeated subsampling in experiments

Abstract

Few-shot learning benchmarks are critical for evaluating modern NLP techniques. It is possible, however, that benchmarks favor methods which easily make use of unlabeled text, because researchers can use unlabeled text from the test set to pretrain their models. Given the dearth of research on this potential problem, we run experiments to quantify the bias caused by pretraining on unlabeled test set text instead of on unlabeled, independently drawn text. Controlled few-shot and zero-shot experiments on 25 classification tasks and 3 language models -- BERT, GPT-2, and Mistral 7B -- do not find evidence of overoptimism. Furthermore, we demonstrate the importance of repeated subsampling when studying few-shot text classification, and recommend that few-shot learning benchmarks include multiple training folds. Code and data are available at https://github.com/kddubey/pretrain-on-test/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kddubey/pretrain-on-test
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Residual Connection · Cosine Annealing · Byte Pair Encoding · BERT