Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
Prerna Juneja, Lika Lomidze

TL;DR
This paper introduces a scalable framework for safety evaluation of AI companions in multi-turn conversations, using controlled persona simulations to identify risks like normalization of unsafe content.
Contribution
It presents the first end-to-end system integrating persona construction, scenario generation, simulation, and harm evaluation for AI safety testing.
Findings
Replika shows narrow emotional responses dominated by curiosity and care.
Replika often mirrors or normalizes unsafe content such as self-harm and violent fantasies.
Controlled simulations reveal safety risks in AI companion responses.
Abstract
There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
