SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations
Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin, Shahnawaz Alam

TL;DR
SecureCAI is a defense framework for large language models in cybersecurity, reducing prompt injection vulnerabilities by 94.7% while maintaining high accuracy, through security-aware guardrails and adaptive learning.
Contribution
It introduces a novel security-aware framework extending Constitutional AI with adaptive mechanisms to defend against prompt injection attacks in cybersecurity applications.
Findings
Reduces attack success rates by 94.7%
Maintains 95.1% accuracy on security tasks
Achieves high constitution adherence scores over 0.92
Abstract
Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversarial cybersecurity environments exposes critical vulnerabilities to prompt injection attacks where malicious instructions embedded in security artifacts manipulate model behavior. This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails, adaptive constitution evolution, and Direct Preference Optimization for unlearning unsafe response patterns, addressing the unique challenges of high-stakes security contexts where traditional safety mechanisms prove insufficient against sophisticated adversarial manipulation. Experimental evaluation demonstrates that SecureCAI reduces attack success rates by 94.7% compared to baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Information and Cyber Security
