DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses

Han Luo; Guy Laban

arXiv:2512.02282·cs.AI·December 3, 2025

DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses

Han Luo, Guy Laban

PDF

Open Access

TL;DR

DialogGuard is a multi-agent framework that evaluates the psychosocial safety of LLM responses across five high-severity dimensions, improving risk detection accuracy and supporting safer deployment in sensitive applications.

Contribution

It introduces a multi-agent evaluation system with four LLM-based judging pipelines for assessing psychosocial risks in LLM outputs, enhancing safety evaluation methods.

Findings

01

Multi-agent mechanisms outperform non-LLM baselines and single-agent judging.

02

Dual-agent correction and majority voting offer optimal accuracy and robustness.

03

Debate approach achieves higher recall but may over-flag borderline cases.

Abstract

Large language models (LLMs) now mediate many web-based mental-health, crisis, and other emotionally sensitive services, yet their psychosocial safety in these settings remains poorly understood and weakly evaluated. We present DialogGuard, a multi-agent framework for assessing psychosocial risks in LLM-generated responses along five high-severity dimensions: privacy violations, discriminatory behaviour, mental manipulation, psychological harm, and insulting behaviour. DialogGuard can be applied to diverse generative models through four LLM-as-a-judge pipelines, including single-agent scoring, dual-agent correction, multi-agent debate, and stochastic majority voting, grounded in a shared three-level rubric usable by both human annotators and LLM judges. Using PKU-SafeRLHF with human safety annotations, we show that multi-agent mechanisms detect psychosocial risks more accurately than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Artificial Intelligence in Healthcare and Education