PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support
Jiwon Kim, Violeta J. Rodriguez, Dong Whi Yoo, Eshwar Chandrasekharan, and Koustuv Saha

TL;DR
This paper introduces PAIR-SAFE, a paired-agent framework that enhances AI mental health support by providing transparent, runtime auditing and refinement grounded in clinical standards, improving response quality and safety.
Contribution
The paper presents a novel paired-agent system integrating a supervisory Judge grounded in MITI-4 to audit and refine AI responses in mental health support, enhancing transparency and clinical alignment.
Findings
Significant improvements in MITI dimensions like Partnership and Collaboration.
Quantitative evaluation shows enhanced response quality.
Expert qualitative analysis confirms the effectiveness of runtime supervision.
Abstract
Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education
