MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering   Challenge for Language Models

Suraj Racha; Prashant Joshi; Anshika Raman; Nikita Jangid; Mridul; Sharma; Ganesh Ramakrishnan; Nirmal Punjabi

arXiv:2502.15418·cs.CL·February 24, 2025

MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models

Suraj Racha, Prashant Joshi, Anshika Raman, Nikita Jangid, Mridul, Sharma, Ganesh Ramakrishnan, Nirmal Punjabi

PDF

1 Datasets

TL;DR

This paper introduces MHQA, a comprehensive benchmark dataset for mental health question answering, covering multiple domains and question types, to evaluate and improve large language models in healthcare contexts.

Contribution

The work presents the first diverse, expert-verified mental health QA dataset with multiple question types and a rigorous pipeline for dataset creation, filling a critical gap in mental health NLP benchmarking.

Findings

01

LLMs achieve varying F1 scores on MHQA, indicating room for improvement.

02

Few-shot and fine-tuning methods impact model performance differently.

03

The dataset enables detailed analysis of LLM capabilities in mental health QA.

Abstract

Mental health remains a challenging problem all over the world, with issues like depression, anxiety becoming increasingly common. Large Language Models (LLMs) have seen a vast application in healthcare, specifically in answering medical questions. However, there is a lack of standard benchmarking datasets for question answering (QA) in mental health. Our work presents a novel multiple choice dataset, MHQA (Mental Health Question Answering), for benchmarking Language models (LMs). Previous mental health datasets have focused primarily on text classification into specific labels or disorders. MHQA, on the other hand, presents question-answering for mental health focused on four key domains: anxiety, depression, trauma, and obsessive/compulsive issues, with diverse question types, namely, factoid, diagnostic, prognostic, and preventive. We use PubMed abstracts as the primary source for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

jastorj/MHQA
dataset· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.