Feature-Guided Black-Box Safety Testing of Deep Neural Networks
Matthew Wicker, Xiaowei Huang, Marta Kwiatkowska

TL;DR
This paper introduces a black-box safety testing method for deep neural networks using feature-guided adversarial example crafting based on object detection and a stochastic game framework, providing safety guarantees and robustness evaluation.
Contribution
It proposes a novel black-box approach employing object detection and a game-theoretic formulation to craft adversarial examples without network knowledge, with theoretical convergence and safety guarantees.
Findings
Method is competitive with white-box approaches.
Provides safety guarantees for Lipschitz networks.
Effective in safety-critical applications like traffic sign recognition.
Abstract
Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
