MDACE: MIMIC Documents Annotated with Code Evidence
Hua Cheng, Rana Jafari, April Russell, Russell Klopfer, Edmond Lu,, Benjamin Striner, Matthew R. Gormley

TL;DR
This paper introduces MDACE, a new annotated dataset of medical documents with evidence spans for coding, enabling improved evaluation of evidence extraction methods in clinical coding systems.
Contribution
The paper presents MDACE, the first publicly available dataset with expert-annotated evidence spans for medical coding, facilitating research in evidence extraction and model interpretability.
Findings
Baseline evidence extraction methods achieve moderate performance.
MDACE enables evaluation of deep learning models for evidence extraction.
The dataset supports research in explainability for clinical coding.
Abstract
We introduce a dataset for evidence/rationale extraction on an extreme multi-label classification task over long medical documents. One such task is Computer-Assisted Coding (CAC) which has improved significantly in recent years, thanks to advances in machine learning technologies. Yet simply predicting a set of final codes for a patient encounter is insufficient as CAC systems are required to provide supporting textual evidence to justify the billing codes. A model able to produce accurate and reliable supporting evidence for each code would be a tremendous benefit. However, a human annotated code evidence corpus is extremely difficult to create because it requires specialized knowledge. In this paper, we introduce MDACE, the first publicly available code evidence dataset, which is built on a subset of the MIMIC-III clinical records. The dataset -- annotated by professional medical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Coding and Health Information · Machine Learning in Healthcare · Biomedical Text Mining and Ontologies
