A Two-Sided Discussion of Preregistration of NLP Research

Anders S{\o}gaard; Daniel Hershcovich; Miryam de Lhoneux

arXiv:2302.10086·cs.CL·February 21, 2023

A Two-Sided Discussion of Preregistration of NLP Research

Anders S{\o}gaard, Daniel Hershcovich, Miryam de Lhoneux

PDF

Open Access

TL;DR

This paper critically examines the potential benefits and drawbacks of preregistration in NLP research, highlighting challenges like bias, p-hacking, and reduced risk tolerance, through a balanced dialogue.

Contribution

It provides a nuanced, balanced discussion of the advantages and disadvantages of adopting preregistration in NLP, emphasizing the complexity of implementing it effectively.

Findings

01

Preregistration can prevent fishing expeditions and promote negative results.

02

It may introduce biases and increase publication bias.

03

It could lead to more p-hacking and reduced risk-taking.

Abstract

Van Miltenburg et al. (2021) suggest NLP research should adopt preregistration to prevent fishing expeditions and to promote publication of negative results. At face value, this is a very reasonable suggestion, seemingly solving many methodological problems with NLP research. We discuss pros and cons -- some old, some new: a) Preregistration is challenged by the practice of retrieving hypotheses after the results are known; b) preregistration may bias NLP toward confirmatory research; c) preregistration must allow for reclassification of research as exploratory; d) preregistration may increase publication bias; e) preregistration may increase flag-planting; f) preregistration may increase p-hacking; and finally, g) preregistration may make us less risk tolerant. We cast our discussion as a dialogue, presenting both sides of the debate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques