What can the machine teach us?

Edward J. Schenck

PMC · DOI:10.1186/s40635-025-00775-3·July 7, 2025

What can the machine teach us?

Edward J. Schenck

PDF

Open Access

Abstract

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

ALB

Proteins1

Species1

Homo sapiens(human · species)

Chemicals2

sodium oxygen

Diseases13

opacities hypertension diabetes organ dysfunction Sepsis death pneumonia tachycardia cough stroke fevers infection tachypnea

Funding1

—http://dx.doi.org/10.13039/100000050National Heart, Lung, and Blood Institute

Keywords

Machine learningUnsupervised clusteringSepsisEpidemiology

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Sepsis Diagnosis and Treatment · Clinical Reasoning and Diagnostic Skills

Full text

Consider a 74-year-old woman presenting to the hospital with a productive cough, fevers, and tachycardia. She has comorbid hypertension, diabetes and slurred speech with some swallowing difficulty since a stroke several years ago. During her initial work up, her white blood cell count is elevated with increased immature granulocytes and her platelet count is 130. Her serum sodium is 133 mmol/L, her albumin is 25 g/L and her international normalized ratio is 1.5. The remainder of her routine chemistries are normal. Despite tachycardia and tachypnea, her mean arterial blood pressure is normal, and her peripheral oxygen saturation is 98% while breathing ambient air. Hazy bibasilar opacities are seen on a frontal chest radiograph, and after treatment with antibiotics, her tachycardia and tachypnea improve. She is admitted to the medical wards. Later that night she is found cool and pulseless.

What caused her death? She almost certainly had pneumonia, and this is communicated to her bereaved family. For epidemiologists, there is a fundamentally different questions to be answered. Was her death due to sepsis, “life-threatening organ dysfunction caused by a dysregulated host response”? [1]. She had an infection, her immune response was not optimal, and her life was not only threatened but ended by this same process. By many measures of account, however, the tally of her death would not be ascribed to sepsis [2]. Her experience and outcome would not be addressed by research and administrative efforts to improve the management of sepsis.

Sepsis is defined using arbitrary thresholds of organ function and lab abnormalities [1–3]. The true number of consequential infections is unknown. There is organ injury that is not captured by current approaches. Are patients with lab abnormalities just below a threshold fundamentally different than those just above? Novel methods are needed to address this concern. Li and colleagues used a combination of machine learning and classical epidemiology to add clarity to this phenomenon [4]. They first extracted a comprehensive electronic health record (EHR) data set representing a large multicenter cohort of patients admitted to an ICU in Canada. They then used this data set to evaluate whether there are groups of patients with potential sepsis that are not characterized by current frameworks, specifically the Center for Disease Control Adult Sepsis Events (ASE) [5]. They then developed a comprehensive pipeline of feature selection followed by dimension reduction and unsupervised clustering approaches. Concluding that the patient clustering derived from Robust and Sparse K-Means Clustering (RSKC) best represented the distribution of data, they then compared the distribution of patients meeting ASE criteria across clusters. Of 48 RSKC derived distinct clusters, 11 clusters contained greater than 50% of patients that met ASE criteria. Within these 11 clusters, those that were ASE—had an in-hospital mortality of 25.8%. Moreover 34.9% of the ASE—patients in the ASE majority clusters met a more liberalized sepsis criterion. Put another way, there are groups of ICU patients that are clinically similar to patients labeled as having sepsis but are currently missed. There are certain limitations to this work. They restricted their study to an ICU cohort, and it is not clear that ASE—patients meeting the more liberalized sepsis definition had an infection with consequence. More granular patient level epidemiology is needed. Lastly, the group relied on expert adjudication for feature selection that may have constrained the novelty of the derived clusters.

The implications of this work are self-evident. Sepsis definitions are arbitrary, and it follows that the resultant epidemiology has significant gaps. We cannot improve what we do not measure. This work clearly shows that data can guide discovery. Unsupervised learning has the potential to teach us something new. Hopefully this innovative work will lead us closer to a more holistic understanding of infection and the human condition.