Cognitive Cybersecurity for Artificial Intelligence: Guardrail Engineering with CCS-7

Yuksel Aydin

arXiv:2508.10033·cs.CR·August 15, 2025

Cognitive Cybersecurity for Artificial Intelligence: Guardrail Engineering with CCS-7

Yuksel Aydin

PDF

TL;DR

This paper introduces CCS-7, a taxonomy of cognitive vulnerabilities in language models, and demonstrates that architecture-specific interventions are necessary for effective cognitive cybersecurity, supported by extensive experiments and human benchmarks.

Contribution

The paper presents CCS-7, a new taxonomy of cognitive vulnerabilities, and provides empirical evidence that architecture-aware guardrails are crucial for language model safety.

Findings

01

Some vulnerabilities are nearly fully mitigated by interventions.

02

Other vulnerabilities can worsen, increasing error rates by up to 135%.

03

Humans show consistent moderate improvement across interventions.

Abstract

Language models exhibit human-like cognitive vulnerabilities, such as emotional framing, that escape traditional behavioral alignment. We present CCS-7 (Cognitive Cybersecurity Suite), a taxonomy of seven vulnerabilities grounded in human cognitive security research. To establish a human benchmark, we ran a randomized controlled trial with 151 participants: a "Think First, Verify Always" (TFVA) lesson improved cognitive security by +7.9% overall. We then evaluated TFVA-style guardrails across 12,180 experiments on seven diverse language model architectures. Results reveal architecture-dependent risk patterns: some vulnerabilities (e.g., identity confusion) are almost fully mitigated, while others (e.g., source interference) exhibit escalating backfire, with error rates increasing by up to 135% in certain models. Humans, in contrast, show consistent moderate improvement. These findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.