Zero-Shot ATC Coding with Large Language Models for Clinical Assessments
Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes,, Jimmy Lin

TL;DR
This paper presents a hierarchical, LLM-based approach for automating ATC code assignment in healthcare, achieving high accuracy while preserving data privacy, and demonstrating the effectiveness of smaller models.
Contribution
It introduces a hierarchical information extraction method for ATC coding using open-source LLMs, enabling privacy-preserving automation with competitive accuracy.
Findings
Achieves 78% exact match accuracy with GPT-4o
60% accuracy with Llama 3.1 70B
Fine-tuned smaller models match larger zero-shot models
Abstract
Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Lung Cancer Treatments and Mutations
MethodsLLaMA · Ontology · Focus
