Evaluating AI cyber capabilities with crowdsourced elicitation

Artem Petrov; Dmitrii Volkov

arXiv:2505.19915·cs.CR·May 28, 2025

Evaluating AI cyber capabilities with crowdsourced elicitation

Artem Petrov, Dmitrii Volkov

PDF

Open Access

TL;DR

This paper investigates crowdsourcing as a scalable, cost-effective method to evaluate AI's offensive cyber capabilities through open-access Capture The Flag competitions, revealing AI's strong performance and potential for ongoing capability assessment.

Contribution

It introduces crowdsourced elicitation via open competitions as an alternative to in-house evaluation, demonstrating its effectiveness in assessing AI cyber skills at scale.

Findings

01

AI teams ranked in top-5% and top-10% in competitions

02

Open-market elicitation is a practical, cost-effective approach

03

AI can reliably solve cyber challenges requiring up to one hour of human effort

Abstract

As AI systems become increasingly capable, understanding their offensive cyber potential is critical for informed governance and responsible deployment. However, it's hard to accurately bound their capabilities, and some prior evaluations dramatically underestimated them. The art of extracting maximum task-specific performance from AIs is called "AI elicitation", and today's safety organizations typically conduct it in-house. In this paper, we explore crowdsourcing elicitation efforts as an alternative to in-house elicitation work. We host open-access AI tracks at two Capture The Flag (CTF) competitions: AI vs. Humans (400 teams) and Cyber Apocalypse (8000 teams). The AI teams achieve outstanding performance at both events, ranking top-5% and top-10% respectively for a total of $7500 in bounties. This impressive performance suggests that open-market elicitation may offer an effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Network Security and Intrusion Detection