The Human Evaluation Datasheet 1.0: A Template for Recording Details of   Human Evaluation Experiments in NLP

Anastasia Shimorina; Anya Belz

arXiv:2103.09710·cs.CL·March 18, 2021·1 cites

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

Anastasia Shimorina, Anya Belz

PDF

Open Access

TL;DR

This paper presents the Human Evaluation Datasheet, a standardized template designed to systematically record details of human evaluation experiments in NLP, enhancing reproducibility and comparability.

Contribution

It introduces a structured datasheet template inspired by prior work to improve documentation and standardization of human evaluations in NLP research.

Findings

01

Facilitates detailed recording of human evaluation experiments

02

Supports comparability and meta-evaluation in NLP studies

03

Aims to improve reproducibility of human evaluation results

Abstract

This paper introduces the Human Evaluation Datasheet, a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP). Originally taking inspiration from seminal papers by Bender and Friedman (2018), Mitchell et al. (2019), and Gebru et al. (2020), the Human Evaluation Datasheet is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques