Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling
Sayantan Kumar, Jeremy C. Weiss

TL;DR
This paper introduces a textual time-series corpus of diabetes case reports, evaluates large language models for timeline extraction, and demonstrates risk modeling for respiratory outcomes using the extracted data.
Contribution
It creates a novel corpus of clinical timelines from case reports, assesses LLM performance in extracting these timelines, and applies the data to risk modeling in diabetes.
Findings
LLM GPT5 achieved 87.1% event coverage and 84.3% temporal sequencing accuracy.
Time-to-event analysis indicated lower respiratory risk among GLP-1 users (HR=0.259).
Temporal annotations and code will be publicly released.
Abstract
Type 2 diabetes case reports describe complex clinical courses, but their timelines are often expressed in language that is difficult to reuse in longitudinal modeling. To address this gap, we developed a textual time-series corpus of 136 PubMed Open Access single-patient case reports involving glucagon-like peptide 1 receptor agonists, with clinical events associated with their most probable reference times. We evaluated automated LLM timeline extraction against gold-standard timelines annotated by clinical domain experts, assessing how well systems recovered clinical events and their timings. The best-performing LLM produced high event coverage (GPT5; 0.871) and reliable temporal sequencing across symptoms (GPT5; 0.843), diagnoses, treatments, laboratory tests, and outcomes. As a downstream demonstration, time-to-event analyses in diabetes suggested lower risk of respiratory sequelae…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
