Language Shapes Mental Health Evaluations in Large Language Models
Jiayi Xu, Xiyang Hu

TL;DR
This paper demonstrates that large language models exhibit cross-linguistic differences in mental health evaluations, with higher stigma responses in Chinese and systematic shifts in downstream decision accuracy, highlighting language's impact on model outputs.
Contribution
It reveals that prompt language systematically influences mental health evaluations in LLMs, affecting stigma assessment and decision accuracy across Chinese and English prompts.
Findings
Models produce higher stigma responses in Chinese prompts.
Sensitivity to stigmatizing content varies by language.
Predicted depression severity shifts systematically with language.
Abstract
This study investigates whether large language models (LLMs) exhibit cross-linguistic differences in mental health evaluations. Focusing on Chinese and English, we examine two widely used models, GPT-4o and Qwen3, to assess whether prompt language systematically shifts mental health-related evaluations and downstream decision outcomes. First, we assess models' evaluative orientation toward mental health stigma using multiple validated measurement scales capturing social stigma, self-stigma, and professional stigma. Across all measures, both models produce higher stigma-related responses when prompted in Chinese than in English. Second, we examine whether these differences also manifest in two common downstream decision tasks in mental health. In a binary mental health stigma detection task, sensitivity to stigmatizing content varies across language prompts, with lower sensitivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Treatment and Access · Mental Health via Writing · Psychometric Methodologies and Testing
