A Comprehensive Evaluation of Large Language Models on Mental Illnesses   in Arabic Context

Noureldin Zahran; Aya E. Fouda; Radwa J. Hanafy; Mohammed E. Fouda

arXiv:2501.06859·cs.CL·January 14, 2025

A Comprehensive Evaluation of Large Language Models on Mental Illnesses in Arabic Context

Noureldin Zahran, Aya E. Fouda, Radwa J. Hanafy, Mohammed E. Fouda

PDF

TL;DR

This study evaluates the performance of eight large language models on Arabic mental health datasets, highlighting the importance of prompt engineering, language configuration, and few-shot learning for improving diagnostic accuracy in Arabic contexts.

Contribution

It provides a comprehensive analysis of LLMs in Arabic mental health applications, emphasizing prompt design, multilingual factors, and few-shot prompting to enhance model effectiveness.

Findings

01

Prompt engineering significantly improves LLM scores.

02

Model selection impacts diagnostic performance, with Phi-3.5 MoE excelling in accuracy.

03

Few-shot prompting notably boosts multi-class classification accuracy.

Abstract

Mental health disorders pose a growing public health concern in the Arab world, emphasizing the need for accessible diagnostic and intervention tools. Large language models (LLMs) offer a promising approach, but their application in Arabic contexts faces challenges including limited labeled datasets, linguistic complexity, and translation biases. This study comprehensively evaluates 8 LLMs, including general multi-lingual models, as well as bi-lingual ones, on diverse mental health datasets (such as AraDepSu, Dreaddit, MedMCQA), investigating the impact of prompt design, language configuration (native Arabic vs. translated English, and vice versa), and few-shot prompting on diagnostic performance. We find that prompt engineering significantly influences LLM scores mainly due to reduced instruction following, with our structured prompt outperforming a less structured variant on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixture of Experts