Drawing Causal Inferences About Performance Effects in NLP
Sandra Wankm\"uller

TL;DR
This paper advocates for a rigorous sampling and evaluation methodology to accurately infer the performance effects of different NLP methods across a broad population of processing systems, moving beyond limited model comparisons.
Contribution
It introduces a systematic procedure for drawing causal inferences about NLP methods' performance effects using population sampling and randomized application of methods.
Findings
Proposes a five-step procedure for causal inference in NLP performance evaluation.
Highlights the importance of sampling from a defined population of processing systems.
Emphasizes the need for randomized application of methods to ensure valid causal conclusions.
Abstract
This article emphasizes that NLP as a science seeks to make inferences about the performance effects that result from applying one method (compared to another method) in the processing of natural language. Yet NLP research in practice usually does not achieve this goal: In NLP research articles, typically only a few models are compared. Each model results from a specific procedural pipeline (here named processing system) that is composed of a specific collection of methods that are used in preprocessing, pretraining, hyperparameter tuning, and training on the target task. To make generalizing inferences about the performance effect that is caused by applying some method A vs. another method B, it is not sufficient to compare a few specific models that are produced by a few specific (probably incomparable) processing systems. Rather, the following procedure would allow drawing inferences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsTest
