Benchmarking and Understanding Safety Risks in AI Character Platforms

Yiluo Wei; Peixian Zhang; Gareth Tyson

arXiv:2512.01247·cs.CR·December 2, 2025

Benchmarking and Understanding Safety Risks in AI Character Platforms

Yiluo Wei, Peixian Zhang, Gareth Tyson

PDF

Open Access

TL;DR

This study systematically evaluates safety in AI character platforms, revealing high unsafe response rates and demonstrating a machine learning model to predict unsafe characters, thereby aiding platform safety improvements.

Contribution

First large-scale safety evaluation of AI character platforms, identifying safety deficits and developing a predictive model for unsafe characters.

Findings

01

Average unsafe response rate of 65.1% in platforms

02

Safety varies significantly across characters

03

ML model predicts unsafe characters with F1-score of 0.81

Abstract

AI character platforms, which allow users to engage in conversations with AI personas, are a rapidly growing application domain. However, their immersive and personalized nature, combined with technical vulnerabilities, raises significant safety concerns. Despite their popularity, a systematic evaluation of their safety has been notably absent. To address this gap, we conduct the first large-scale safety study of AI character platforms, evaluating 16 popular platforms using a benchmark set of 5,000 questions across 16 safety categories. Our findings reveal a critical safety deficit: AI character platforms exhibit an average unsafe response rate of 65.1%, substantially higher than the 17.7% average rate of the baselines. We further discover that safety performance varies significantly across different characters and is strongly correlated with character features such as demographics and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · AI in Service Interactions · Ethics and Social Impacts of AI