Supervised Learning in the Presence of Noise: Application in ICD-10 Code Classification
Youngwoo Kim, Cheng Li, Bingyang Ye, Amir Tahmasebi, Javed Aslam

TL;DR
This paper addresses the challenge of training accurate ICD-10 classifiers despite systematic label noise caused by human coder errors, proposing a novel noise-aware training method that improves classification performance.
Contribution
It introduces a new approach to handle systematic label noise in ICD-10 coding by identifying common misuses and developing a noise-aware training strategy.
Findings
Proposed method outperforms baseline classifiers on expert-validated labels.
Systematic noise in ICD coding is linked to code hierarchy and coder confusion.
Handling systematic noise improves classifier robustness and accuracy.
Abstract
ICD coding is the international standard for capturing and reporting health conditions and diagnosis for revenue cycle management in healthcare. Manually assigning ICD codes is prone to human error due to the large code vocabulary and the similarities between codes. Since machine learning based approaches require ground truth training data, the inconsistency among human coders is manifested as noise in labeling, which makes the training and evaluation of ICD classifiers difficult in presence of such noise. This paper investigates the characteristics of such noise in manually-assigned ICD-10 codes and furthermore, proposes a method to train robust ICD-10 classifiers in the presence of labeling noise. Our research concluded that the nature of such noise is systematic. Most of the existing methods for handling label noise assume that the noise is completely random and independent of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms
