Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments
Kunal Mukherjee

TL;DR
This paper evaluates the vulnerabilities of LLM security advisors for Trusted Execution Environments through red-teaming, revealing transferability of failures and proposing an evaluation pipeline to mitigate risks.
Contribution
It introduces TEE-RedBench, a comprehensive evaluation methodology for LLM-based TEE security advisors, and demonstrates techniques to significantly reduce prompt-induced failures.
Findings
Failures transfer up to 12.02% across LLMs.
Structured evaluation pipeline reduces failures by 80.62%.
Identifies key limitations of LLMs in security advisory roles.
Abstract
Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors for TEE architecture review, mitigation planning, and vulnerability triage. This creates a socio-technical risk surface: assistants may hallucinate TEE mechanisms, overclaim guarantees (e.g., what attestation does and does not establish), or behave unsafely under adversarial prompting. We present a red-teaming study of two prevalently deployed LLM assistants in the role of TEE security advisors: ChatGPT-5.2 and Claude Opus-4.6, focusing on the inherent limitations and transferability of prompt-induced failures across LLMs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Information and Cyber Security
