Evaluating Text-based Conversational Agents for Mental Health: A Systematic Review of Metrics, Methods and Usage Contexts
Jiangtao Gong, Xiao Wen, Fengyi Tao, Xinqi Wang, Xixi Yang, Yangrong Tang

TL;DR
This systematic review analyzes evaluation practices of text-based mental health conversational agents, highlighting current metrics, methods, and usage contexts, and emphasizing the need for more rigorous, culturally sensitive, and comprehensive evaluation approaches.
Contribution
It provides a structured synthesis of evaluation metrics, methods, and contexts, and advocates for methodological improvements in assessing mental health conversational agents.
Findings
Reliance on Western-developed scales limits cultural applicability.
Most studies use small, short-term samples.
Automated metrics show weak correlation with user well-being.
Abstract
Text-based conversational agents (CAs) are increasingly used in mental health, yet evaluation practices remain fragmented. We conducted a PRISMA-guided systematic review (May-June 2024) across ACM Digital Library, Scopus, and PsycINFO. From 613 records, 132 studies were included, with dual-coder extraction achieving substantial agreement (Cohen's kappa = 0.77-0.92). We synthesized evaluation approaches across three dimensions: metrics, methods, and usage contexts. Metrics were classified into CA-centric attributes (e.g., reliability, safety, empathy) and user-centric outcomes (experience, knowledge, psychological state, health behavior). Methods included automated analyses, standardized psychometric scales, and qualitative inquiry. Temporal designs ranged from momentary to follow-up assessments. Findings show reliance on Western-developed scales, limited cultural adaptation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · AI in Service Interactions
