Quantifying the Risk of Transferred Black Box Attacks
Disesdi Susanna Cox, Niklas Bunzel

TL;DR
This paper presents a framework for quantifying the risk of transferred black-box adversarial attacks on neural networks, using surrogate models and CKA similarity to improve resilience testing and risk estimation.
Contribution
It introduces a targeted resilience testing framework employing surrogate models with CKA similarity to better estimate adversarial attack risks.
Findings
Surrogate models with high and low CKA similarity improve attack coverage.
Regression-based estimators provide realistic risk quantification.
Complete adversarial risk mapping is computationally infeasible.
Abstract
Neural networks have become pervasive across various applications, including security-related products. However, their widespread adoption has heightened concerns regarding vulnerability to adversarial attacks. With emerging regulations and standards emphasizing security, organizations must reliably quantify risks associated with these attacks, particularly regarding transferred adversarial attacks, which remain challenging to evaluate accurately. This paper investigates the complexities involved in resilience testing against transferred adversarial attacks. Our analysis specifically addresses black-box evasion attacks, highlighting transfer-based attacks due to their practical significance and typically high transferability between neural network models. We underline the computational infeasibility of exhaustively exploring high-dimensional input spaces to achieve complete test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
