From Extreme Multi-label to Multi-class: A Hierarchical Approach for Automated ICD-10 Coding Using Phrase-level Attention
Cansu Sen, Bingyang Ye, Javed Aslam, Amir Tahmasebi

TL;DR
This paper introduces a hierarchical approach to automate ICD-10 coding by converting the extreme multi-label problem into a multi-class task, leveraging phrase-level annotations for improved accuracy and interpretability.
Contribution
The authors propose a novel hierarchical model with supervised attention for ICD coding, significantly enhancing accuracy and interpretability over existing multi-label methods.
Findings
23% improvement in subset accuracy
18% increase in micro-F1 score
15% boost in instance-based F-1
Abstract
Clinical coding is the task of assigning a set of alphanumeric codes, referred to as ICD (International Classification of Diseases), to a medical event based on the context captured in a clinical narrative. The latest version of ICD, ICD-10, includes more than 70,000 codes. As this is a labor-intensive and error-prone task, automatic ICD coding of medical reports using machine learning has gained significant interest in the last decade. Existing literature has modeled this problem as a multi-label task. Nevertheless, such multi-label approach is challenging due to the extremely large label set size. Furthermore, the interpretability of the predictions is essential for the endusers (e.g., healthcare providers and insurance companies). In this paper, we propose a novel approach for automatic ICD coding by reformulating the extreme multi-label problem into a simpler multi-class problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
