Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification
Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, Ren\'e van Es, Bram van Es

TL;DR
This study demonstrates effective automatic extraction of cardiac diagnosis labels from unstructured Dutch echocardiogram reports using span- and document-level classification models, improving efficiency and accuracy in clinical labeling.
Contribution
Introduces and evaluates novel span- and document-level classification models for extracting diagnoses from Dutch echocardiogram reports, with models outperforming existing methods.
Findings
SpanCategorizer achieved F1-scores up to 0.93 for span classification.
MedRoBERTa.nl achieved F1-scores up to 0.98 for document classification.
SetFit performs well with limited training data, near-perfect results with fewer labels.
Abstract
Clinical machine learning research and AI driven clinical decision support models rely on clinically accurate labels. Manually extracting these labels with the help of clinical specialists is often time-consuming and expensive. This study tests the feasibility of automatic span- and document-level diagnosis extraction from unstructured Dutch echocardiogram reports. We included 115,692 unstructured echocardiogram reports from the UMCU a large university hospital in the Netherlands. A randomly selected subset was manually annotated for the occurrence and severity of eleven commonly described cardiac characteristics. We developed and tested several automatic labelling techniques at both span and document levels, using weighted and macro F1-score, precision, and recall for performance evaluation. We compared the performance of span labelling against document labelling methods, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗UMCU/Echocardiogram_WMA_reducedmodel· 3 dl3 dl
- 🤗UMCU/Echocardiogram_TricuspidRegurgitation_reducedmodel· 1 dl1 dl
- 🤗UMCU/Echocardiogram_RV_SYST_FUNC_reducedmodel· 2 dl2 dl
- 🤗UMCU/Echocardiogram_RV_DILATION_reducedmodel· 3 dl3 dl
- 🤗UMCU/Echocardiogram_PericardialEffusion_reducedmodel
- 🤗UMCU/Echocardiogram_MitralRegurgitation_reducedmodel· 1 dl1 dl
- 🤗UMCU/Echocardiogram_LV_syst_func_reducedmodel
- 🤗UMCU/Echocardiogram_LV_dilation_reducedmodel
- 🤗UMCU/Echocardiogram_Diastolic_dysfunction_reducedmodel· 1 dl1 dl
- 🤗UMCU/Echocardiogram_aortic_stenosis_reducedmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiovascular Function and Risk Factors · Machine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging
MethodsSparse Evolutionary Training
