Coding historical causes of death data with Large Language Models
Bj{\o}rn Pedersen, Maisha Islam, Doris Tove Kristoffersen, Lars Ailo, Bongo, Eilidh Garrett, Alice Reid, Hilde Sommerseth

TL;DR
This study assesses the potential of large language models like GPT-3.5, GPT-4, and Llama 2 to automate assigning ICD-10 codes to historical death causes, finding current models insufficient without further tuning.
Contribution
It evaluates the performance of state-of-the-art LLMs on historical ICD-10 coding and highlights their limitations, suggesting avenues for improvement.
Findings
GPT-4 achieves 83% accuracy in code assignment.
Models perform better on causes with modern terms and shorter descriptions.
Maximum accuracy with additional machine learning techniques reaches 89%.
Abstract
This paper investigates the feasibility of using pre-trained generative Large Language Models (LLMs) to automate the assignment of ICD-10 codes to historical causes of death. Due to the complex narratives often found in historical causes of death, this task has traditionally been manually performed by coding experts. We evaluate the ability of GPT-3.5, GPT-4, and Llama 2 LLMs to accurately assign ICD-10 codes on the HiCaD dataset that contains causes of death recorded in the civil death register entries of 19,361 individuals from Ipswich, Kilmarnock, and the Isle of Skye from the UK between 1861-1901. Our findings show that GPT-3.5, GPT-4, and Llama 2 assign the correct code for 69%, 83%, and 40% of causes, respectively. However, we achieve a maximum accuracy of 89% by standard machine learning techniques. All LLMs performed better for causes of death that contained terms still in use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Cosine Annealing · Dropout · Linear Warmup With Cosine Annealing · Label Smoothing · Residual Connection · Absolute Position Encodings
