Hierarchical Evaluation Framework: Best Practices for Human Evaluation
Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty,, Josip Car

TL;DR
This paper introduces a hierarchical evaluation framework for human assessment of NLP systems, aiming to standardize and improve the fairness and comprehensiveness of evaluations, demonstrated through application to a Machine Reading Comprehension system.
Contribution
The paper develops a novel hierarchical evaluation framework that addresses gaps in NLP human evaluation methodologies and enhances the assessment of both inputs and outputs.
Findings
Framework provides a more comprehensive performance assessment
Evaluation of a Machine Reading Comprehension system demonstrates input-output quality links
Future work on potential time savings for evaluators
Abstract
Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing literature on human evaluation metrics, we identified several gaps in NLP evaluation methodologies. These gaps served as motivation for developing our own hierarchical evaluation framework. The proposed framework offers notable advantages, particularly in providing a more comprehensive representation of the NLP system's performance. We applied this framework to evaluate the developed Machine Reading Comprehension system, which was utilized within a human-AI symbiosis model. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)
