Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

May Lynn Reese; Markela Zeneli; Mindy Ng; Jacob Haimes; Andreea Damien; Elizabeth Stade

arXiv:2604.02359·cs.CL·April 6, 2026

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

May Lynn Reese, Markela Zeneli, Mindy Ng, Jacob Haimes, Andreea Damien, Elizabeth Stade

PDF

TL;DR

This paper develops and validates clinician-informed safety criteria for LLMs in mental health, testing automated evaluation methods that align well with human judgments to improve scalable safety assessments.

Contribution

It introduces a clinician-informed safety evaluation framework and demonstrates that LLM-based judges can reliably assess LLM responses in psychosis-related contexts.

Findings

01

LLM-as-a-Judge closely aligns with human consensus (κ=0.75).

02

LLM-as-a-Jury achieves slightly lower agreement (κ=0.74).

03

Automated LLM assessments show promise for scalable safety evaluation.

Abstract

General-purpose Large Language Models (LLMs) are becoming widely adopted by people for mental health support. Yet emerging evidence suggests there are significant risks associated with high-frequency use, particularly for individuals suffering from psychosis, as LLMs may reinforce delusions and hallucinations. Existing evaluations of LLMs in mental health contexts are limited by a lack of clinical validation and scalability of assessment. To address these issues, this research focuses on psychosis as a critical condition for LLM safety evaluation by (1) developing and validating seven clinician-informed safety criteria, (2) constructing a human-consensus dataset, and (3) testing automated assessment using an LLM as an evaluator (LLM-as-a-Judge) or taking the majority vote of several LLM judges (LLM-as-a-Jury). Results indicate that LLM-as-a-Judge aligns closely with the human consensus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.