AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes

Haoran Xi; Minghao Shao; Kimberly Milner; Venkata Sai Charan Putrevu; Nanda Rani; Meet Udeshi; Prashanth Krishnamurthy; Brendan Dolan-Gavitt; Siddharth Garg; Sandeep Kumar Shukla; Farshad Khorrami; Alon Hillel-Tuch; Muhammad Shafique; Ramesh Karri

arXiv:2603.21551·cs.SE·April 1, 2026

AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes

Haoran Xi, Minghao Shao, Kimberly Milner, Venkata Sai Charan Putrevu, Nanda Rani, Meet Udeshi, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Siddharth Garg, Sandeep Kumar Shukla, Farshad Khorrami, Alon Hillel-Tuch, Muhammad Shafique, Ramesh Karri

PDF

TL;DR

This study investigates how different levels of AI autonomy in cybersecurity Capture-the-Flag competitions impact participant performance and learning, providing design principles for scalable, fair, and effective AI-assisted cybersecurity education.

Contribution

It formalizes autonomy levels in AI-assisted cybersecurity competitions, analyzes multi-region data, and offers practical guidelines for designing effective LLM-centered educational challenges.

Findings

01

Autonomous and hybrid frameworks yield higher success rates on iterative challenges.

02

Participants prefer lightweight, tool-augmented prompting over complex multi-agent designs.

03

Designing competitions with autonomy-specific scoring and verification improves accessibility and evaluation.

Abstract

Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition designs and evaluation practices that remain fair and evidence-based. This paper presents a cross-regional study of LLM-centered Capture-the-Flag competitions built on the Cyber Security Awareness Week competition system. To understand how autonomy levels and participants' knowledge backgrounds influence problem-solving performance and learning-related behaviors, we formalize three autonomy levels: human-in-the-loop, autonomous agent frameworks, and hybrid. To enable verification, we require traceable submissions including conversation logs, agent trajectories, and agent code. We analyze multi-region competition data covering an in-class track, a standard track, and a year-long expert track,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.