Targeting the Benchmark: On Methodology in Current Natural Language   Processing Research

David Schlangen

arXiv:2007.04792·cs.CL·July 10, 2020

Targeting the Benchmark: On Methodology in Current Natural Language Processing Research

David Schlangen

PDF

TL;DR

This paper critically examines the methodology behind creating and using benchmarks in NLP research, highlighting the need for clearer progress criteria and better evaluation practices.

Contribution

It analyzes current benchmarking practices in NLP, proposing a framework to better understand and evaluate progress in the field.

Findings

01

Current benchmarks often lack clear justification for progress

02

Baseline models are frequently used without critical evaluation

03

The paper suggests improved methodologies for benchmarking

Abstract

It has become a common pattern in our field: One group introduces a language task, exemplified by a dataset, which they argue is challenging enough to serve as a benchmark. They also provide a baseline model for it, which then soon is improved upon by other groups. Often, research efforts then move on, and the pattern repeats itself. What is typically left implicit is the argumentation for why this constitutes progress, and progress towards what. In this paper, we try to step back for a moment from this pattern and work out possible argumentations and their parts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.