Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors

Ming Huang; Zehan Li; Yan Hu; Wanjing Wang; Andrew Wen; Scott Lane; Salih Selek; Lokesh Shahani; Rodrigo Machado-Vieira; Jair Soares; Hua Xu; Hongfang Liu

arXiv:2507.17009·cs.CL·July 24, 2025

Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors

Ming Huang, Zehan Li, Yan Hu, Wanjing Wang, Andrew Wen, Scott Lane, Salih Selek, Lokesh Shahani, Rodrigo Machado-Vieira, Jair Soares, Hua Xu, Hongfang Liu

PDF

TL;DR

This study demonstrates the effectiveness of generative large language models like GPT-3.5 and GPT-4.5 in multi-label classification of suicidality-related factors from electronic health records, advancing clinical AI applications.

Contribution

It introduces a novel end-to-end generative multi-label classification pipeline and advanced evaluation methods for clinical text analysis using LLMs.

Findings

01

GPT-3.5 achieved 0.94 partial match accuracy and 0.91 F1 score.

02

GPT-4.5 with guided prompting outperformed GPT-3.5, especially on rare labels.

03

Models tend to over-label and conflate related suicidality factors.

Abstract

Suicide remains a pressing global health crisis, with over 720,000 deaths annually and millions more affected by suicide ideation (SI) and suicide attempts (SA). Early identification of suicidality-related factors (SrFs), including SI, SA, exposure to suicide (ES), and non-suicidal self-injury (NSSI), is critical for timely intervention. While prior studies have applied AI to detect SrFs in clinical notes, most treat suicidality as a binary classification task, overlooking the complexity of cooccurring risk factors. This study explores the use of generative large language models (LLMs), specifically GPT-3.5 and GPT-4.5, for multi-label classification (MLC) of SrFs from psychiatric electronic health records (EHRs). We present a novel end to end generative MLC pipeline and introduce advanced evaluation methods, including label set level metrics and a multilabel confusion matrix for error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.