Policy-Grounded Safety Evaluation of 20 Large Language Models

Juan Manuel Contreras

arXiv:2507.14719·cs.AI·May 1, 2026

Policy-Grounded Safety Evaluation of 20 Large Language Models

Juan Manuel Contreras

PDF

TL;DR

This paper presents Aymara AI, a platform for scalable, policy-grounded safety evaluation of 20 large language models across diverse real-world domains, revealing significant performance disparities and emphasizing safety assessment challenges.

Contribution

Introduction of Aymara AI, a novel platform that transforms safety policies into adversarial prompts and scores models, enabling comprehensive safety evaluation across multiple LLMs.

Findings

01

Models scored highest in misinformation (mean 95.7%)

02

Models performed poorly in privacy and impersonation (mean 24.3%)

03

Safety scores varied significantly across models and domains (p < .05)

Abstract

As large language models (LLMs) become increasingly integrated into real-world applications, scalable and rigorous safety evaluation is essential. This paper introduces Aymara AI, a programmatic platform for generating and administering customized, policy-grounded safety evaluations. Aymara AI transforms natural-language safety policies into adversarial prompts and scores model responses using an AI-based rater validated against human judgments. We demonstrate its capabilities through the Aymara LLM Risk and Responsibility Matrix, which evaluates 20 commercially available LLMs across 10 real-world safety domains. Results reveal wide performance disparities, with mean safety scores ranging from 86.2% to 52.4%. While models performed well in well-established safety domains such as Misinformation (mean = 95.7%), they consistently failed in more complex or underspecified domains, notably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.