Development, system design, safety, and performance metrics of a conversational agent for reducing depressive and anxious symptoms based on a large language model: The MHAI study
David Villarreal-Zegarra, Yscenia Paredes-Gonzales, Andrea Dámaso-Román, Judith Quiñones-Inga, Gianfranco Centeno-Terrazas, Yan Pieer Alexis-Montalban Lozada, Issa Atoum, Issa Atoum, Issa Atoum, Issa Atoum

TL;DR
The study developed a mental health conversational agent using large language models and found GPT-4o performed better than Llama 3.1-8B in simulated interactions for depression and anxiety.
Contribution
A transparent, standardized evaluation framework for conversational agents using GPT-4o and Llama 3.1-8B in mental health support.
Findings
GPT-4o outperformed Llama 3.1-8B in response quality, clarity, and robustness in simulated mental health interactions.
GPT-4o generated longer and more diverse responses but at a higher cost compared to Llama 3.1-8B.
The platform includes therapeutic modules, self-assessment tools, and an emergency alert system for mental health support.
Abstract
Conversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment. We have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B). We conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · Emotion and Mood Recognition
