CounselReflect: A Toolkit for Auditing Mental-Health Dialogues

Yahan Li; Chaohao Du; Zeyang Li; Christopher Chun Kuizon; Shupeng Cheng; Angel Hsing-Chi Hwang; Adam C. Frank; Ruishan Liu

arXiv:2603.29429·cs.CL·April 1, 2026

CounselReflect: A Toolkit for Auditing Mental-Health Dialogues

Yahan Li, Chaohao Du, Zeyang Li, Christopher Chun Kuizon, Shupeng Cheng, Angel Hsing-Chi Hwang, Adam C. Frank, Ruishan Liu

PDF

1 Repo

TL;DR

CounselReflect is a comprehensive toolkit designed to enable transparent, multi-dimensional auditing of mental-health dialogues generated by conversational AI systems, supporting users and professionals.

Contribution

It introduces a novel, multi-faceted evaluation system combining model-based and rubric-based metrics, with flexible deployment options and demonstrated human and expert usability.

Findings

01

Supports transparent inspection with session summaries and turn-level scores.

02

Integrates 69 literature-derived and custom metrics via LLM judges.

03

Human and expert evaluations indicate it is understandable, usable, and trustworthy.

Abstract

Mental-health support is increasingly mediated by conversational systems (e.g., LLM-based tools), but users often lack structured ways to audit the quality and potential risks of the support they receive. We introduce CounselReflect, an end-to-end toolkit for auditing mental-health support dialogues. Rather than producing a single opaque quality score, CounselReflect provides structured, multi-dimensional reports with session-level summaries, turn-level scores, and evidence-linked excerpts to support transparent inspection. The system integrates two families of evaluation signals: (i) 12 model-based metrics produced by task-specific predictors, and (ii) rubric-based metrics that extend coverage via a literature-derived library (69 metrics) and user-defined custom metrics, operationalized with configurable LLM judges. CounselReflect is available as a web application, browser extension,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.