Towards BERT-based Automatic ICD Coding: Limitations and Opportunities
Damian Pascual, Sandro Luck, Roger Wattenhofer

TL;DR
This paper investigates the use of BERT-based models for automatic ICD coding, highlighting the challenges with long medical notes and proposing that better text aggregation methods are needed for improvement.
Contribution
It provides a detailed analysis of BERT-based ICD coding, identifying the main limitations with long texts and suggesting directions for future research.
Findings
BERT models struggle with long medical notes due to fine-tuning difficulties.
Pretrained transformers can perform competitively with small text segments.
Improved text aggregation methods are crucial for better ICD coding performance.
Abstract
Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
