TL;DR
Parallax introduces an architectural framework for safe autonomous AI execution that prevents harmful actions by structurally separating reasoning from execution and incorporating multi-tiered validation and rollback mechanisms.
Contribution
It proposes a novel architectural paradigm for AI safety that surpasses prompt-based guardrails, including open-source implementation and comprehensive adversarial evaluation.
Findings
Blocks 98.9% of attacks in tests with default settings
Achieves 100% attack blocking under maximum-security configuration
Architectural boundary remains effective even when reasoning system is compromised
Abstract
Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise applications will embed AI copilots by the end of 2026. As agents gain the ability to execute real-world actions (reading files, running commands, making network requests, modifying databases), a fundamental security gap has emerged. The dominant approach to agent safety relies on prompt-level guardrails: natural language instructions that operate at the same abstraction level as the threats they attempt to mitigate. This paper argues that prompt-based safety is architecturally insufficient for agents with execution capability and introduces Parallax, a paradigm for safe autonomous AI execution grounded in four principles: Cognitive-Executive Separation, which structurally prevents the reasoning system from executing actions; Adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
