Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen; John-Michael Gamble; Micaela Jantzi; John P. Hirdes,; Jimmy Lin

arXiv:2412.07743·cs.CL·December 11, 2024·2 cites

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes,, Jimmy Lin

PDF

Open Access 1 Video

TL;DR

This paper presents a hierarchical, LLM-based approach for automating ATC code assignment in healthcare, achieving high accuracy while preserving data privacy, and demonstrating the effectiveness of smaller models.

Contribution

It introduces a hierarchical information extraction method for ATC coding using open-source LLMs, enabling privacy-preserving automation with competitive accuracy.

Findings

01

Achieves 78% exact match accuracy with GPT-4o

02

60% accuracy with Llama 3.1 70B

03

Fine-tuned smaller models match larger zero-shot models

Abstract

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments· underline

Taxonomy

TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Lung Cancer Treatments and Mutations

MethodsLLaMA · Ontology · Focus