Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Aditya Nagori; Ayush Gautam; Matthew O. Wiens; Vuong Nguyen; Nathan Kenya Mugisha; Jerome Kabakyenga; Niranjan Kissoon; John Mark Ansermino; Rishikesan Kamaleswaran

arXiv:2505.09805·q-bio.QM·August 5, 2025

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Aditya Nagori, Ayush Gautam, Matthew O. Wiens, Vuong Nguyen, Nathan Kenya Mugisha, Jerome Kabakyenga, Niranjan Kissoon, John Mark Ansermino, Rishikesan Kamaleswaran

PDF

Open Access

TL;DR

This study demonstrates that Large Language Models can effectively cluster pediatric sepsis patient data, outperforming classical methods by capturing richer contextual information, which aids personalized care in resource-limited settings.

Contribution

It introduces LLM-based clustering for healthcare data, showing improved performance over traditional methods in phenotyping pediatric sepsis patients.

Findings

01

LLM-based clustering achieved higher Silhouette Scores.

02

LLMs captured richer context and key features.

03

Identified distinct patient subgroups with clinical relevance.

Abstract

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Text Readability and Simplification

Methodsk-Means Clustering · LLaMA