Please, Don't Forget the Difference and the Confidence Interval when   Seeking for the State-of-the-Art Status

Yves Bestgen

arXiv:2205.11134·cs.CL·May 24, 2022

Please, Don't Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status

Yves Bestgen

PDF

Open Access 2 Repos

TL;DR

This paper advocates for using bootstrap confidence intervals over traditional SOTA comparisons in NLP, emphasizing their ability to highlight performance differences and quantify superiority, supported by case studies and a Python toolkit.

Contribution

It introduces the widespread use of bootstrap confidence intervals for NLP system comparison, providing practical tools and illustrating their advantages over significance testing.

Findings

01

Bootstrap confidence intervals effectively highlight performance differences.

02

Confidence intervals quantify the degree of system superiority.

03

Tools for calculating these intervals are freely available in Python.

Abstract

This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing. Their main benefits are to draw attention to the difference in performance between two systems and to help assessing the degree of superiority of one system over another. Two cases studies, one comparing several systems and the other based on a K-fold cross-validation procedure, illustrate these benefits. A python module for obtaining these confidence intervals as well as a second function implementing the Fisher-Pitman test for paired samples are freely available on PyPi.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Text Analysis Techniques · Explainable Artificial Intelligence (XAI)