Confronting the Reproducibility Crisis: A Case Study of Challenges in Cybersecurity AI
Richard H. Moulton, Gary A. McCully, John D. Hastings

TL;DR
This paper investigates the reproducibility challenges in AI cybersecurity research, especially in adversarial robustness, highlighting software, hardware, and documentation issues through a case study with VeriGauge.
Contribution
It provides a detailed case study exposing reproducibility barriers in cybersecurity AI research and proposes standardized practices to improve reliability and validation.
Findings
Reproducibility is hindered by software and hardware incompatibilities.
Standardized methodologies and containerization can improve reproducibility.
Addressing reproducibility enhances trust and security in AI-based cybersecurity systems.
Abstract
In the rapidly evolving field of cybersecurity, ensuring the reproducibility of AI-driven research is critical to maintaining the reliability and integrity of security systems. This paper addresses the reproducibility crisis within the domain of adversarial robustness -- a key area in AI-based cybersecurity that focuses on defending deep neural networks against malicious perturbations. Through a detailed case study, we attempt to validate results from prior work on certified robustness using the VeriGauge toolkit, revealing significant challenges due to software and hardware incompatibilities, version conflicts, and obsolescence. Our findings underscore the urgent need for standardized methodologies, containerization, and comprehensive documentation to ensure the reproducibility of AI models deployed in critical cybersecurity applications. By tackling these reproducibility challenges,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
