State-of-the-art generalisation research in NLP: A taxonomy and review

Dieuwke Hupkes; Mario Giulianelli; Verna Dankers; Mikel Artetxe; Yanai; Elazar; Tiago Pimentel; Christos Christodoulopoulos; Karim Lasri; Naomi; Saphra; Arabella Sinclair; Dennis Ulmer; Florian Schottmann; Khuyagbaatar; Batsuren; Kaiser Sun; Koustuv Sinha; Leila Khalatbari; Maria Ryskina; Rita; Frieske; Ryan Cotterell; Zhijing Jin

arXiv:2210.03050·cs.CL·January 15, 2024

State-of-the-art generalisation research in NLP: A taxonomy and review

Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai, Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi, Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar, Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari

PDF

TL;DR

This paper provides a comprehensive taxonomy and review of NLP generalisation research, classifying over 400 papers to understand current practices and guide future standards for evaluating model generalisation.

Contribution

It introduces a detailed taxonomy for characterising NLP generalisation studies and offers a large-scale classification of existing research to inform future directions.

Findings

01

Mapped out the current landscape of NLP generalisation research

02

Identified key axes along which studies differ

03

Provided recommendations for future research focus

Abstract

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they investigate, the type of data shift they consider, the source of this data shift, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 papers that test generalisation, for a total of more than 600 individual experiments. Considering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest