A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context
Noureldin Zahran, Aya E. Fouda, Radwa J. Hanafy, Mohammed E. Fouda

TL;DR
This study evaluates the performance of eight large language models on Arabic mental health datasets, highlighting the importance of prompt engineering, language configuration, and few-shot learning for improving diagnostic accuracy in Arabic contexts.
Contribution
It provides a comprehensive analysis of LLMs in Arabic mental health applications, emphasizing prompt design, multilingual factors, and few-shot prompting to enhance model effectiveness.
Findings
Prompt engineering significantly improves LLM scores.
Model selection impacts diagnostic performance, with Phi-3.5 MoE excelling in accuracy.
Few-shot prompting notably boosts multi-class classification accuracy.
Abstract
Mental health disorders pose a growing public health concern in the Arab world, emphasizing the need for accessible diagnostic and intervention tools. Large language models (LLMs) offer a promising approach, but their application in Arabic contexts faces challenges including limited labeled datasets, linguistic complexity, and translation biases. This study comprehensively evaluates 8 LLMs, including general multi-lingual models, as well as bi-lingual ones, on diverse mental health datasets (such as AraDepSu, Dreaddit, MedMCQA), investigating the impact of prompt design, language configuration (native Arabic vs. translated English, and vice versa), and few-shot prompting on diagnostic performance. We find that prompt engineering significantly influences LLM scores mainly due to reduced instruction following, with our structured prompt outperforming a less structured variant on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Experts
