Analyzing LLM Reasoning to Uncover Mental Health Stigma
Sreehari Sankar, Aliakbar Nafar, Mona Barman, Hannah K. Heitz, Ashwin Kumar, Pouria Tohidi, Dailun Li, Danish Hussain, Russell DuBois, Hamed Hasheminia, Farshad Majzoubi

TL;DR
This paper investigates how large language models exhibit mental health stigma by analyzing their reasoning processes, revealing more nuanced biases than traditional evaluation methods.
Contribution
It introduces a framework for analyzing LLM reasoning to uncover hidden stigmatizing language and extends a mental health stigma benchmark with new conditions.
Findings
Reasoning analysis uncovers more stigma than MCQ evaluations.
Framework categorizes and rates severity of stigmatizing statements.
Extended benchmark captures a broader range of mental health conditions.
Abstract
While large language models (LLMs) are increasingly being explored for mental health applications, recent studies reveal that they can exhibit stigma toward individuals with psychological conditions. Existing evaluations of this stigma primarily rely on multiple-choice questions (MCQs), which fail to capture the biases embedded within the models' underlying logic. In this paper, we analyze the intermediate reasoning steps of LLMs to uncover hidden stigmatizing language and the internal rationales driving it. We leverage clinical expertise to categorize common patterns of stigmatizing language directed at individuals with psychological conditions and use this framework to identify and tag problematic statements in LLM reasoning. Furthermore, we rate the severity of these statements, distinguishing between overt prejudice and more subtle, less immediately harmful biases. To broaden the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
