RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records

John Wu; Adam Cross; Jimeng Sun

arXiv:2507.15867·cs.LG·May 14, 2026

RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records

John Wu, Adam Cross, Jimeng Sun

PDF

1 Repo

TL;DR

RDMA introduces a cost-effective, agent-driven approach using small language models to mine unstructured clinical notes for rare disease documentation, outperforming baselines without task-specific training.

Contribution

The paper presents RDMA, a novel agentic framework that enhances small LLMs with tools for rare disease extraction, reducing costs and enabling private deployment.

Findings

01

RDMA outperforms fine-tuned and RAG baselines across benchmarks.

02

A small quantized model achieves maximal performance with significant cost reductions.

03

Uncertainty-flagging reduces expert annotation while maintaining quality.

Abstract

Rare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhnwu3/RDMA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.