TL;DR
RDMA introduces a cost-effective, agent-driven approach using small language models to mine unstructured clinical notes for rare disease documentation, outperforming baselines without task-specific training.
Contribution
The paper presents RDMA, a novel agentic framework that enhances small LLMs with tools for rare disease extraction, reducing costs and enabling private deployment.
Findings
RDMA outperforms fine-tuned and RAG baselines across benchmarks.
A small quantized model achieves maximal performance with significant cost reductions.
Uncertainty-flagging reduces expert annotation while maintaining quality.
Abstract
Rare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
