Supervised Learning in the Presence of Noise: Application in ICD-10 Code   Classification

Youngwoo Kim; Cheng Li; Bingyang Ye; Amir Tahmasebi; Javed Aslam

arXiv:2103.07808·cs.LG·March 16, 2021

Supervised Learning in the Presence of Noise: Application in ICD-10 Code Classification

Youngwoo Kim, Cheng Li, Bingyang Ye, Amir Tahmasebi, Javed Aslam

PDF

Open Access

TL;DR

This paper addresses the challenge of training accurate ICD-10 classifiers despite systematic label noise caused by human coder errors, proposing a novel noise-aware training method that improves classification performance.

Contribution

It introduces a new approach to handle systematic label noise in ICD-10 coding by identifying common misuses and developing a noise-aware training strategy.

Findings

01

Proposed method outperforms baseline classifiers on expert-validated labels.

02

Systematic noise in ICD coding is linked to code hierarchy and coder confusion.

03

Handling systematic noise improves classifier robustness and accuracy.

Abstract

ICD coding is the international standard for capturing and reporting health conditions and diagnosis for revenue cycle management in healthcare. Manually assigning ICD codes is prone to human error due to the large code vocabulary and the similarities between codes. Since machine learning based approaches require ground truth training data, the inconsistency among human coders is manifested as noise in labeling, which makes the training and evaluation of ICD classifiers difficult in presence of such noise. This paper investigates the characteristics of such noise in manually-assigned ICD-10 codes and furthermore, proposes a method to train robust ICD-10 classifiers in the presence of labeling noise. Our research concluded that the nature of such noise is systematic. Most of the existing methods for handling label noise assume that the noise is completely random and independent of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms