Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
Nabeel Seedat, Caterina Tozzi, Andrea Hita Ardiaca, Mihaela van der, Schaar, James Weatherall, Adam Taylor

TL;DR
ALIGN is a novel compositional large language model system that automates medical coding for clinical trial data, improving accuracy and reliability while reducing costs and facilitating data reuse across studies.
Contribution
The paper introduces ALIGN, a zero-shot, compositional LLM system with self-evaluation and uncertainty estimation for reliable medical coding, outperforming existing baselines on complex clinical trial data.
Findings
ALIGN achieves 87-90% accuracy on MedDRA coding at detailed levels.
ALIGN outperforms baselines by 7-22% on ATC coding, especially at lower hierarchy levels.
Uncertainty-based deferral improves accuracy to 90% with 30% deferral rate.
Abstract
The reuse of historical clinical trial data has significant potential to accelerate medical research and drug development. However, interoperability challenges, particularly with missing medical codes, hinders effective data integration across studies. While Large Language Models (LLMs) offer a promising solution for automated coding without labeled data, current approaches face challenges on complex coding tasks. We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding. ALIGN follows a three-step process: (1) diverse candidate code generation; (2) self-evaluation of codes and (3) confidence scoring and uncertainty estimation enabling human deferral to ensure reliability. We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Byte Pair Encoding · Linear Layer · Softmax · BERT
