ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems
Florin Adrian Chitan

TL;DR
ILION is a deterministic, interpretable safety gate for autonomous AI agents that classifies actions as safe or unsafe in real-time, outperforming existing moderation tools in accuracy and speed without requiring labeled data.
Contribution
The paper introduces ILION, a novel deterministic safety system for agentic AI, capable of rapid, interpretable decision-making without training data, addressing a critical safety gap.
Findings
ILION achieves high F1 score of 0.8515 and low false positive rate of 7.9%.
ILION operates with sub-millisecond latency, significantly faster than baselines.
Existing text moderation tools fail on agent safety tasks due to task mismatch.
Abstract
The proliferation of autonomous AI agents capable of executing real-world actions - filesystem operations, API calls, database modifications, financial transactions - introduces a class of safety risk not addressed by existing content-moderation infrastructure. Current text-safety systems evaluate linguistic content for harm categories such as violence, hate speech, and sexual content; they are architecturally unsuitable for evaluating whether a proposed action falls within an agent's authorized operational scope. We present ILION (Intelligent Logic Identity Operations Network), a deterministic execution gate for agentic AI systems. ILION employs a five-component cascade architecture - Transient Identity Imprint (TII), Semantic Vector Reference Frame (SVRF), Identity Drift Control (IDC), Identity Resonance Score (IRS) and Consensus Veto Layer (CVL) - to classify proposed agent actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection
