Semi-self-supervised Automated ICD Coding
Hlynur D. Hlynsson, Steind\'or Ellertsson, J\'on F. Da{\dh}ason, Emil, L. Sigurdsson, Hrafn Loftsson

TL;DR
This paper introduces a semi-self-supervised approach to augment limited Icelandic clinical text data for ICD coding by training neural networks to extract diagnostic features from unannotated notes, improving classification accuracy.
Contribution
It presents a novel semi-self-supervised method for augmenting scarce clinical datasets using neural network-based feature extraction from unannotated texts.
Findings
Data augmentation improves diagnosis classification accuracy.
Effectiveness diminishes when clinical examination features are available.
Method benefits datasets with limited annotated clinical notes.
Abstract
Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors' diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned imputation in a semi-self-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling
