Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals
Kathrin Blagec, Jakob Kraiger, Wolfgang Fr\"uhwirt, Matthias Samwald

TL;DR
This paper presents a comprehensive catalog of 450 clinical NLP datasets and benchmarks, revealing a significant gap between AI benchmark coverage and the tasks clinicians prioritize for automation in healthcare.
Contribution
It systematically reviews and annotates clinical NLP datasets, and compares benchmark tasks with clinicians' desired automation targets, highlighting misalignments.
Findings
AI benchmarks lack clinical relevance and coverage.
Clinicians prioritize tasks not represented in current benchmarks.
Existing benchmarks do not address routine clinical documentation.
Abstract
Publicly accessible benchmarks that allow for assessing and comparing model performances are important drivers of progress in artificial intelligence (AI). While recent advances in AI capabilities hold the potential to transform medical practice by assisting and augmenting the cognitive processes of healthcare professionals, the coverage of clinically relevant tasks by AI benchmarks is largely unclear. Furthermore, there is a lack of systematized meta-information that allows clinical AI researchers to quickly determine accessibility, scope, content and other characteristics of datasets and benchmark datasets relevant to the clinical domain. To address these issues, we curated and released a comprehensive catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP), based on a systematic review of literature and online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging
