PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Seulki Lee

arXiv:2604.11070·cs.AI·April 14, 2026

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Seulki Lee

PDF

TL;DR

This paper introduces the PRISM framework, a hierarchy-based method for detecting AI behavioral risks by analyzing structural anomalies in value, evidence, and source prioritization, enabling anticipatory safety measures.

Contribution

It proposes a novel taxonomy of 27 risk signals based on hierarchy anomalies, with a dual-threshold classification system, advancing AI safety from reactive to proactive detection.

Findings

01

The framework discriminates between models with extreme, context-dependent, and balanced risk profiles.

02

It is grounded in empirical data from approximately 397,000 forced-choice responses across 7 AI models.

03

The hierarchy-based signals can detect structural risk patterns before harmful outputs occur.

Abstract

Current approaches to AI safety define red lines at the case level: specific prompts, specific outputs, specific harms. This paper argues that red lines can be set more fundamentally -- at the level of value, evidence, and source hierarchies that govern AI reasoning. Using the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework, we define a taxonomy of 27 behavioral risk signals derived from structural anomalies in how AI systems prioritize values (L4), weight evidence types (L3), and trust information sources (L2). Each signal is evaluated through a dual-threshold principle combining absolute rank position and relative win-rate gap, producing a two-tier classification (Confirmed Risk vs. Watch Signal). The hierarchy-based approach offers three advantages over case-specific red lines: it is anticipatory rather than reactive (detecting dangerous reasoning structures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.