Experimental Evaluation and Development of a Silver-Standard for the   MIMIC-III Clinical Coding Dataset

Thomas Searle; Zina Ibrahim; Richard JB Dobson

arXiv:2006.07332·cs.LG·February 28, 2023

Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

Thomas Searle, Zina Ibrahim, Richard JB Dobson

PDF

1 Repo

TL;DR

This paper introduces a reproducible method to evaluate the accuracy of clinical codes in the MIMIC-III dataset, revealing that many codes are under-represented, which impacts NLP-based clinical coding research.

Contribution

It proposes a novel methodology for validating clinical codes in MIMIC-III and highlights the potential under-coding issue in this widely used dataset.

Findings

01

Most frequent codes are under-coded up to 35%

02

Highlights the need for secondary validation of MIMIC-III codes

03

Provides an open-source framework for code validation

Abstract

Clinical coding is currently a labour-intensive, error-prone, but critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new state of the art results. A popular dataset used in this task is MIMIC-III, a large intensive care database that includes clinical free text notes and associated codes. We argue for the reconsideration of the validity MIMIC-III's assigned codes that are often treated as gold-standard, especially when MIMIC-III has not undergone secondary validation. This work presents an open-source, reproducible experimental methodology for assessing the validity of codes derived from EHR discharge summaries. We exemplify the methodology with MIMIC-III…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CogStack/MedCAT
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.