Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries
Alexander Cann, Ian Colbert, Ihab Amer

TL;DR
This paper introduces a robust transferable feature extractor (RTFE) that enhances adversarial robustness of pre-trained models against white-box attacks, demonstrating transferability and one-shot robustness across models and datasets.
Contribution
The paper proposes a novel RTFE method that transfers adversarial defenses to independently trained models, improving robustness against white-box adversaries.
Findings
RTFE provides adversarial robustness to multiple pre-trained classifiers.
RTFE achieves one-shot robustness across different datasets.
The method is effective against adaptive white-box adversaries.
Abstract
The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
