Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents
Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang

TL;DR
The paper introduces the Cognitive Control Architecture (CCA), a comprehensive framework that enhances the security and robustness of autonomous LLM agents against complex IPI attacks by monitoring and controlling their entire lifecycle.
Contribution
It presents a novel, holistic defense system combining proactive integrity enforcement and deep reasoning adjudication to counter sophisticated prompt injection attacks.
Findings
CCA effectively withstands complex IPI attacks.
It achieves full lifecycle security without sacrificing efficiency.
Experimental results show improved robustness over existing methods.
Abstract
Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external information sources, exploiting fundamental trade-offs between security and functionality in existing defense mechanisms. This leads to malicious and unauthorized tool invocations, diverting agents from their original objectives. The success of complex IPIs reveals a deeper systemic fragility: while current defenses demonstrate some effectiveness, most defense architectures are inherently fragmented. Consequently, they fail to provide full integrity assurance across the entire task execution pipeline, forcing unacceptable multi-dimensional compromises among security, functionality, and efficiency. Our method is predicated on a core insight: no matter how subtle an IPI attack, its pursuit of a malicious…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper tries to address an emerging and important area of research, the safety of LLM-based autonomous driving. As our community and society pay close attention to this area, I am happy to see a paper submission in this area. I can see that their methodology achieves higher performance than baseline and existing methods in their evaluation.
I have the following major concerns about this paper: ### Critical presentation errors This paper has a significant number of presentation errors across the paper. Particularly, this paper does not have any references to the figures in this paper, even though this paper has 4 figures. This prevents me from fully being convinced of the reported result's validity. Furthermore, this paper does not clearly explain how their dataset constructed in Section 4.1 is used in the following evaluation wi
- **Insightful design**: The paper is novel in designing a multi-layer framework to inspect the agent action process. The framework is designed to inspect both the data-flow and the underlying intention to ensure a safe agent behavior. - **Promising results**: The paper shows promising results in defending against prompt injection attacks in AgentDojo benchmarks, surpassing previous work or achieving comparable performance in lowering the attack success rate. Meanwhile, the proposed methods don’
## Major - **Lack of evaluation dataset and models**: The paper mainly evaluated the results on one dataset (AgentDojo), using two LLMs (DeepSeek and KIMI). The authors are expected to conduct experiments on multiple datasets and models to support the generalization of the proposed methods. - **Lack of experimental justification of Graph Updated**: The paper proposes to dynamically update the graph, but lacks of ablation study on how the design will influence the benign utilization and attack
* The proposed safeguard demonstrates a Pareto improvement over state-of-the-art defenses against indirect prompt injection attacks. * The proposed approach is more efficient (in terms of tokens) than the state-of-the-art defense. * The ablations presented in Table 3 are beneficial to understanding why the method works.
* The presentation quality is quite poor and the paper needs quite a bit of polishing. There are several typos throughout (Figure 2, first column: "Chack" -> "Check", third column: "Adjustor" -> "Adjudicator"?, line 383 "¡"?, inter alia). The figure and table captions are unclear or promise presentation not represented in the figure (e.g. Table 1, the caption promises that the best defense numbers should be bolded, but they are not). This does not inspire confidence in the results. * The writing
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
