Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF)

V\'ictor Mayoral-Vilches; Luis Javier Navarrete-Lozano; Francesco Balassone; Mar\'ia Sanz-G\'omez; Crist\'obal R. J. Veas Chavez; Maite del Mundo de Torres; Vanesa Turiel

arXiv:2512.02654·cs.CR·December 3, 2025

Cybersecurity AI: The World's Top AI Agent for Security Capture-the-Flag (CTF)

V\'ictor Mayoral-Vilches, Luis Javier Navarrete-Lozano, Francesco Balassone, Mar\'ia Sanz-G\'omez, Crist\'obal R. J. Veas Chavez, Maite del Mundo de Torres, Vanesa Turiel

PDF

Open Access

TL;DR

In 2025, a specialized AI agent called CAI dominated major cybersecurity Capture-the-Flag competitions, outperforming human teams and raising questions about the relevance of current contest formats for measuring security talent.

Contribution

The paper introduces a novel AI architecture, alias1, that achieves unprecedented performance and cost efficiency in cybersecurity CTFs, surpassing human capabilities and challenging existing evaluation methods.

Findings

01

CAI achieved top ranks in multiple 2025 CTF competitions.

02

CAI captured 41/45 flags at Neurogrid and outperformed humans in speed and accuracy.

03

The alias1 model reduces inference costs from $5,940 to $119, enabling continuous autonomous security operations.

Abstract

Are Capture-the-Flag competitions obsolete? In 2025, Cybersecurity AI (CAI) systematically conquered some of the world's most prestigious hacking competitions, achieving Rank #1 at multiple events and consistently outperforming thousands of human teams. Across five major circuits-HTB's AI vs Humans, Cyber Apocalypse (8,129 teams), Dragos OT CTF, UWSP Pointer Overflow, and the Neurogrid CTF showdown-CAI demonstrated that Jeopardy-style CTFs have become a solved game for well-engineered AI agents. At Neurogrid, CAI captured 41/45 flags to claim the $50,000 top prize; at Dragos OT, it sprinted 37% faster to 10K points than elite human teams; even when deliberately paused mid-competition, it maintained top-tier rankings. Critically, CAI achieved this dominance through our specialized alias1 model architecture, which delivers enterprise-scale AI security operations at unprecedented cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Network Security and Intrusion Detection