ECBD: Evidence-Centered Benchmark Design for NLP
Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao,, Alexandra Olteanu, Ziang Xiao

TL;DR
This paper introduces ECBD, a formal framework inspired by educational assessment principles, to systematically analyze and improve the validity of NLP benchmarks through structured design decisions.
Contribution
It proposes a novel, formalized framework for benchmark design in NLP, enabling explicit justification and validation of design choices.
Findings
ECBD reveals common trends that may threaten benchmark validity.
Case studies demonstrate ECBD's utility in analyzing existing benchmarks.
The framework promotes more transparent and valid benchmark creation.
Abstract
Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity of the benchmark's measurements. To address this gap, we draw on evidence-centered design in educational assessments and propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. ECBD specifies the role each module plays in helping practitioners collect evidence about capabilities of interest. Specifically, each module requires benchmark designers to describe, justify, and support benchmark design choices -- e.g., clearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Semantic Web and Ontologies
