Towards Predicting the Success of Transfer-based Attacks by Quantifying   Shared Feature Representations

Ashley S. Dale; Mei Qiu; Foo Bin Che; Thomas Bsaibes; Lauren; Christopher; Paul Salama

arXiv:2412.05351·cs.CV·December 10, 2024

Towards Predicting the Success of Transfer-based Attacks by Quantifying Shared Feature Representations

Ashley S. Dale, Mei Qiu, Foo Bin Che, Thomas Bsaibes, Lauren, Christopher, Paul Salama

PDF

Open Access

TL;DR

This paper investigates whether shared feature representations between models can predict the success of transfer-based attacks in black-box settings, using a novel methodology to quantify feature similarity and its correlation with attack transferability.

Contribution

It introduces a new method to predict attack success by measuring shared feature representations, confirming their correlation with transfer attack effectiveness.

Findings

01

Shared feature representations moderately correlate with attack success (=0.56)

02

Shared features exist across models of different sizes and complexities

03

Datasets from various domains can interpret black-box feature representations

Abstract

Much effort has been made to explain and improve the success of transfer-based attacks (TBA) on black-box computer vision models. This work provides the first attempt at a priori prediction of attack success by identifying the presence of vulnerable features within target models. Recent work by Chen and Liu (2024) proposed the manifold attack model, a unifying framework proposing that successful TBA exist in a common manifold space. Our work experimentally tests the common manifold space hypothesis by a new methodology: first, projecting feature vectors from surrogate and target feature extractors trained on ImageNet onto the same low-dimensional manifold; second, quantifying any observed structure similarities on the manifold; and finally, by relating these observed similarities to the success of the TBA. We find that shared feature representation moderately correlates with increased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSuicide and Self-Harm Studies · Forensic and Genetic Research · Spam and Phishing Detection