Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Iva Bojic; Jessica Chen; Si Yuan Chang; Qi Chwen Ong; Shafiq Joty,; Josip Car

arXiv:2310.01917·cs.CL·October 13, 2023·5 cites

Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty,, Josip Car

PDF

Open Access

TL;DR

This paper introduces a hierarchical evaluation framework for human assessment of NLP systems, aiming to standardize and improve the fairness and comprehensiveness of evaluations, demonstrated through application to a Machine Reading Comprehension system.

Contribution

The paper develops a novel hierarchical evaluation framework that addresses gaps in NLP human evaluation methodologies and enhances the assessment of both inputs and outputs.

Findings

01

Framework provides a more comprehensive performance assessment

02

Evaluation of a Machine Reading Comprehension system demonstrates input-output quality links

03

Future work on potential time savings for evaluators

Abstract

Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing literature on human evaluation metrics, we identified several gaps in NLP evaluation methodologies. These gaps served as motivation for developing our own hierarchical evaluation framework. The proposed framework offers notable advantages, particularly in providing a more comprehensive representation of the NLP system's performance. We applied this framework to evaluate the developed Machine Reading Comprehension system, which was utilized within a human-AI symbiosis model. The results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)