Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation
Aleksandar Savkov, Francesco Moramarco, Alex Papadopoulos Korfiatis,, Mark Perera, Anya Belz, Ehud Reiter

TL;DR
This paper introduces a standardized evaluation protocol for medical note generation using Consultation Checklists, which improves inter-annotator agreement and correlates better with human judgments than traditional metrics.
Contribution
The paper proposes a novel evaluation protocol grounded in Consultation Checklists to enhance objectivity and consistency in assessing medical note generation systems.
Findings
High inter-annotator agreement achieved with the protocol
Consultation Checklists improve correlation of automatic metrics with human judgments
Protocol facilitates more reliable evaluation of medical note generation systems
Abstract
Evaluating automatically generated text is generally hard due to the inherently subjective nature of many aspects of the output quality. This difficulty is compounded in automatic consultation note generation by differing opinions between medical experts both about which patient statements should be included in generated notes and about their respective importance in arriving at a diagnosis. Previous real-world evaluations of note-generation systems saw substantial disagreement between expert evaluators. In this paper we propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists, which are created in a preliminary step and then used as a common point of reference during quality assessment. We observed good levels of inter-annotator agreement in a first evaluation study using the protocol; further, using Consultation Checklists produced in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Biomedical Text Mining and Ontologies · Electronic Health Records Systems
