Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study
Shaoxiong Ji, Matti H\"oltt\"a, Pekka Marttinen

TL;DR
This study evaluates whether pretrained language models improve medical code assignment from clinical notes, finding that classical CNNs can outperform attention-based models with proper training, challenging current trends.
Contribution
It introduces a hierarchical fine-tuning architecture and demonstrates that classical CNNs can outperform attention models in medical code prediction tasks.
Findings
Classical CNNs outperform attention-based models with proper training.
Pretrained models do not always enhance performance in medical code assignment.
Hierarchical fine-tuning improves model effectiveness.
Abstract
Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in many downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering. This paper conducts a comprehensive quantitative analysis of various contextualized language models' performance, pretrained in different domains, for medical code assignment from clinical notes. We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information. Contrary to current trends, we demonstrate that a carefully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
