Modeling Diagnostic Label Correlation for Automatic ICD Coding
Shang-Chi Tsai, Chao-Wei Huang, Yun-Nung Chen

TL;DR
This paper introduces a novel two-stage framework that models label correlations to improve automatic ICD coding from clinical notes, significantly enhancing prediction accuracy on benchmark datasets.
Contribution
It is the first to learn label set distribution as a reranking module, capturing label dependencies to boost multi-label classification performance in medical coding.
Findings
Improved prediction accuracy on MIMIC datasets.
Effective modeling of label correlations enhances ICD coding.
First application of label set distribution learning in medical coding.
Abstract
Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for medical code prediction. In the experiments, our proposed framework is able to improve upon best-performing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning in Healthcare · Text and Document Classification Technologies
