Efficient Certified Defenses Against Patch Attacks on Image Classifiers
Jan Hendrik Metzen, Maksym Yatsura

TL;DR
This paper introduces BagCert, a new method combining architecture and certification techniques to efficiently provide certifiable robustness against patch attacks on image classifiers, suitable for safety-critical autonomous systems.
Contribution
BagCert offers an end-to-end optimized approach for certifying robustness against patches, balancing efficiency and high accuracy on clean inputs.
Findings
Certifies 10,000 examples in 43 seconds on a single GPU.
Achieves 86% accuracy on clean images.
Provides 60% certified accuracy against 5x5 patches.
Abstract
Adversarial patches pose a realistic threat model for physical world attacks on autonomous systems via their perception component. Autonomous systems in safety-critical domains such as automated driving should thus contain a fail-safe fallback component that combines certifiable robustness against patches with efficient inference while maintaining high performance on clean inputs. We propose BagCert, a novel combination of model architecture and certification procedure that allows efficient certification. We derive a loss that enables end-to-end optimization of certified robustness against patches of different sizes and locations. On CIFAR10, BagCert certifies 10.000 examples in 43 seconds on a single GPU and obtains 86% clean and 60% certified accuracy against 5x5 patches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
