L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification
Rishik Kondadadi, John E. Ortega

TL;DR
L2D-Clinical introduces a framework that learns when to defer clinical text classification tasks from BERT models to LLMs, improving accuracy and cost-efficiency by leveraging the strengths of both models adaptively.
Contribution
The paper presents a novel learning-to-defer framework for clinical text classification that optimally combines BERT and LLMs based on uncertainty and text features.
Findings
L2D-Clinical improves F1 scores by 1.7 to 9.3 points over BERT.
Selective deferral to LLMs enhances accuracy while reducing API usage.
The approach effectively leverages LLM strengths in clinical text tasks.
Abstract
Clinical text classification requires choosing between specialized fine-tuned models (BERT variants) and general-purpose large language models (LLMs), yet neither dominates across all instances. We introduce Learning to Defer for clinical text (L2D-Clinical), a framework that learns when a BERT classifier should defer to an LLM based on uncertainty signals and text characteristics. Unlike prior L2D work that defers to human experts assumed universally superior, our approach enables adaptive deferral-improving accuracy when the LLM complements BERT. We evaluate on two English clinical tasks: (1) ADE detection (ADE Corpus V2), where BioBERT (F1=0.911) outperforms the LLM (F1=0.765), and (2) treatment outcome classification (MIMIC-IV with multi-LLM consensus ground truth), where GPT-5-nano (F1=0.967) outperforms ClinicalBERT (F1=0.887). On ADE, L2D-Clinical achieves F1=0.928 (+1.7 points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
