PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support

Jiwon Kim; Violeta J. Rodriguez; Dong Whi Yoo; Eshwar Chandrasekharan; and Koustuv Saha

arXiv:2601.12754·cs.HC·January 21, 2026

PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support

Jiwon Kim, Violeta J. Rodriguez, Dong Whi Yoo, Eshwar Chandrasekharan, and Koustuv Saha

PDF

Open Access

TL;DR

This paper introduces PAIR-SAFE, a paired-agent framework that enhances AI mental health support by providing transparent, runtime auditing and refinement grounded in clinical standards, improving response quality and safety.

Contribution

The paper presents a novel paired-agent system integrating a supervisory Judge grounded in MITI-4 to audit and refine AI responses in mental health support, enhancing transparency and clinical alignment.

Findings

01

Significant improvements in MITI dimensions like Partnership and Collaboration.

02

Quantitative evaluation shows enhanced response quality.

03

Expert qualitative analysis confirms the effectiveness of runtime supervision.

Abstract

Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education