Secondary Use of Clinical Problem List Entries for Neural Network-Based Disease Code Assignment
Markus Kreuzthaler, Bastian Pfeifer, Diether Kramer, Stefan Schulz

TL;DR
This study evaluates neural network models for automating ICD-10 coding of clinical problem list entries, demonstrating that advanced models like RoBERTa outperform simpler baselines, though manual coding inconsistencies limit accuracy.
Contribution
It introduces a neural network-based approach for ICD-10 coding of clinical problem list entries and compares different architectures, highlighting the effectiveness of a downstreamed RoBERTa model.
Findings
FastText baseline achieved 0.83 F1-score
Character-level LSTM achieved 0.84 F1-score
RoBERTa model achieved 0.88 F1-score
Abstract
Clinical information systems have become large repositories for semi-structured and partly annotated electronic health record data, which have reached a critical mass that makes them interesting for supervised data-driven neural network approaches. We explored automated coding of 50 character long clinical problem list entries using the International Classification of Diseases (ICD-10) and evaluated three different types of network architectures on the top 100 ICD-10 three-digit codes. A fastText baseline reached a macro-averaged F1-score of 0.83, followed by a character-level LSTM with a macro-averaged F1-score of 0.84. The top performing approach used a downstreamed RoBERTa model with a custom language model, yielding a macro-averaged F1-score of 0.88. A neural network activation analysis together with an investigation of the false positives and false negatives unveiled inconsistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Medical Coding and Health Information
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Softmax · Adam · Attention Dropout · Dropout · Multi-Head Attention · Layer Normalization
