Large Language Models for Mental Health: A Multilingual Evaluation

Nishat Raihan; Sadiya Sayara Chowdhury Puspo; Ana-Maria Bucur; Stevie Chancellor; Marcos Zampieri

arXiv:2602.02440·cs.CL·February 3, 2026

Large Language Models for Mental Health: A Multilingual Evaluation

Nishat Raihan, Sadiya Sayara Chowdhury Puspo, Ana-Maria Bucur, Stevie Chancellor, Marcos Zampieri

PDF

Open Access 1 Video

TL;DR

This study evaluates multilingual large language models in mental health tasks, revealing their strengths in non-English languages and limitations with machine-translated data, across various datasets and settings.

Contribution

It provides a comprehensive evaluation of proprietary and open-source LLMs on multilingual mental health datasets, highlighting their performance and translation-related challenges.

Findings

01

Proprietary and fine-tuned open-source LLMs often outperform traditional baselines.

02

Performance drops on machine-translated data vary by language and typology.

03

LLMs show strengths in non-English mental health tasks but face translation quality challenges.

Abstract

Large Language Models (LLMs) have remarkable capabilities across NLP tasks. However, their performance in multilingual contexts, especially within the mental health domain, has not been thoroughly explored. In this paper, we evaluate proprietary and open-source LLMs on eight mental health datasets in various languages, as well as their machine-translated (MT) counterparts. We compare LLM performance in zero-shot, few-shot, and fine-tuned settings against conventional NLP baselines that do not employ LLMs. In addition, we assess translation quality across language families and typologies to understand its influence on LLM performance. Proprietary LLMs and fine-tuned open-source LLMs achieve competitive F1 scores on several datasets, often surpassing state-of-the-art results. However, performance on MT data is generally lower, and the extent of this decline varies by language and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Large Language Models for Mental Health: A Multilingual Evaluation· underline

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Machine Learning in Healthcare