SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models

Sarah Pungitore; Shashank Yadav; Molly Douglas; Jarrod Mosier; and Vignesh Subbian

arXiv:2506.16359·q-bio.QM·August 18, 2025

SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models

Sarah Pungitore, Shashank Yadav, Molly Douglas, Jarrod Mosier, and Vignesh Subbian

PDF

TL;DR

This paper introduces SHREC, a framework that leverages lightweight large language models to automate and improve computational phenotyping tasks, demonstrating high accuracy and adaptability in classifying phenotypes from electronic health records.

Contribution

The paper presents a novel framework integrating lightweight LLMs into phenotyping pipelines, showing their effectiveness in classifying concepts and phenotypes with high accuracy.

Findings

01

Mistral model achieved AUROC of 0.896 for concept classification.

02

Models demonstrated near-perfect specificity for phenotyping tasks.

03

Lightweight LLMs can adapt to new tasks with prompt engineering and incorporate raw EHR data.

Abstract

Computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications. However, it is time-intensive because of manual data review and limited automation. Since LLMs have demonstrated promising capabilities for text classification, comprehension, and generation, we posit they will perform well at repetitive manual review tasks traditionally performed by human experts. To support next-generation computational phenotyping, we developed SHREC, a framework for integrating LLMs into end-to-end phenotyping pipelines. We applied and tested three lightweight LLMs (Gemma2 27 billion, Mistral Small 24 billion, and Phi-4 14 billion) to classify concepts and phenotype patients using phenotypes for ARF respiratory support therapies. All models performed well on concept classification, with the best (Mistral) achieving an AUROC of 0.896. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.