Development, system design, safety, and performance metrics of a conversational agent for reducing depressive and anxious symptoms based on a large language model: The MHAI study

David Villarreal-Zegarra; Yscenia Paredes-Gonzales; Andrea Dámaso-Román; Judith Quiñones-Inga; Gianfranco Centeno-Terrazas; Yan Pieer Alexis-Montalban Lozada; Issa Atoum; Issa Atoum; Issa Atoum; Issa Atoum

PMC · DOI:10.1371/journal.pone.0344939·March 18, 2026

Development, system design, safety, and performance metrics of a conversational agent for reducing depressive and anxious symptoms based on a large language model: The MHAI study

David Villarreal-Zegarra, Yscenia Paredes-Gonzales, Andrea Dámaso-Román, Judith Quiñones-Inga, Gianfranco Centeno-Terrazas, Yan Pieer Alexis-Montalban Lozada, Issa Atoum, Issa Atoum, Issa Atoum, Issa Atoum

PDF

Open Access

TL;DR

The study developed a mental health conversational agent using large language models and found GPT-4o performed better than Llama 3.1-8B in simulated interactions for depression and anxiety.

Contribution

A transparent, standardized evaluation framework for conversational agents using GPT-4o and Llama 3.1-8B in mental health support.

Findings

01

GPT-4o outperformed Llama 3.1-8B in response quality, clarity, and robustness in simulated mental health interactions.

02

GPT-4o generated longer and more diverse responses but at a higher cost compared to Llama 3.1-8B.

03

The platform includes therapeutic modules, self-assessment tools, and an emergency alert system for mental health support.

Abstract

Conversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment. We have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B). We conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases2

depression anxiety

Figures5

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health via Writing · Emotion and Mood Recognition