ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
Sarthak Munshi, Swapnil Pathak, Sonam Ghatode, Thenuga Priyadarshini, Dhivya Chandramouleeswaran, Ashutosh Rana

TL;DR
This paper evaluates large language models' ability to identify and analyze security threats in cloud infrastructure using a new dataset of AWS deployment scenarios, highlighting GPT-4.1 and Gemini 2.5 Pro's strengths.
Contribution
Introduces ACSE-Eval, a comprehensive dataset for assessing LLMs in cloud security threat modeling, and provides systematic evaluation results demonstrating their capabilities.
Findings
GPT 4.1 and Gemini 2.5 Pro excel at threat identification
Gemini 2.5 Pro performs best in 0-shot scenarios
GPT 4.1 outperforms in few-shot settings
Abstract
While Large Language Models have shown promise in cybersecurity applications, their effectiveness in identifying security threats within cloud deployments remains unexplored. This paper introduces AWS Cloud Security Engineering Eval, a novel dataset for evaluating LLMs cloud security threat modeling capabilities. ACSE-Eval contains 100 production grade AWS deployment scenarios, each featuring detailed architectural specifications, Infrastructure as Code implementations, documented security vulnerabilities, and associated threat modeling parameters. Our dataset enables systemic assessment of LLMs abilities to identify security risks, analyze attack vectors, and propose mitigation strategies in cloud environments. Our evaluations on ACSE-Eval demonstrate that GPT 4.1 and Gemini 2.5 Pro excel at threat identification, with Gemini 2.5 Pro performing optimally in 0-shot scenarios and GPT 4.1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Network Security and Intrusion Detection · Information and Cyber Security
MethodsAttention Is All You Need · Softmax · Cosine Annealing · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay · Dropout
