With Great Backbones Comes Great Adversarial Transferability
Erik Arakelyan, Karen Hambardzumyan, Davit Papikyan, Pasquale, Minervini, Albert Gordo, Isabelle Augenstein, Aram H. Markosyan

TL;DR
This paper systematically evaluates the adversarial robustness of self-supervised pre-trained vision models, revealing that transfer attacks can be highly effective even with limited knowledge, posing significant security risks.
Contribution
It introduces a comprehensive analysis of attack transferability on SSL pre-trained backbones and proposes proxy-based attacks that rival white-box methods.
Findings
Proxy attacks approach white-box effectiveness.
Backbone-only attacks outperform black-box methods.
Increasing tuning meta-information affects transferability.
Abstract
Advances in self-supervised learning (SSL) for machine vision have improved representation robustness and model performance, giving rise to pre-trained backbones like \emph{ResNet} and \emph{ViT} models tuned with SSL methods such as \emph{SimCLR}. Due to the computational and data demands of pre-training, the utilization of such backbones becomes a strenuous necessity. However, employing these backbones may inherit vulnerabilities to adversarial attacks. While adversarial robustness has been studied under \emph{white-box} and \emph{black-box} settings, the robustness of models tuned on pre-trained backbones remains largely unexplored. Additionally, the role of tuning meta-information in mitigating exploitation risks is unclear. This work systematically evaluates the adversarial robustness of such models across combinations of tuning meta-information, including fine-tuning…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. It presents a backbone-only attack that is simple yet surprisingly effective, highlighting risks in model-sharing practices. 2. It covers a wide range of experiments, such as 352 models, 4 datasets (CIFAR-10/100, Oxford Pets, Flowers), multiple SSL methods (SimCLR, SwAV, DINO, PIRL, etc.), and attack types (PGD, FGSM, Square). 3. The paper is well-structured, with clear definitions of tuning configurations, proxy models, and attack metrics (ASR, TSR).
> The backbone attack is essentially PGD in representation space, similar to prior work in self-supervised adversarial training (e.g., NPR). > The idea of transferability from shared components has been explored in surrogate-based attacks and meta-surrogates. > While the paper exposes vulnerabilities, it does not suggest any concrete defenses or guidelines for secure backbone sharing. > All experiments are on classification tasks. It’s unclear whether the findings generalize to other domains
1. Realistic gray-box formulation tied to today’s model-sharing ecosystem. 2. Large-scale, systematic study across many backbones/datasets with consistent metrics. 3. Backbone Attack is simple, reproduces easily, and approaches white-box. 4. Clear empirical insight: fine-tuning mode dominates transferability; backbone weights is equivalent in effectiveness to possessing all tuning configurations about the target model.
1. Black-box baselines: main-text details on query budgets/early-stopping are sparse; broader black-box comparisons would help, eg, transfer-based attack. 2. Limited to classification and small datasets; unclear if results hold for detection/segmentation or larger-scale datasets. 3. Reduced transfer on domain-specific datasets is noted but under-analyzed.
1. The topic is relevant and timely, especially given the prevalence of shared pre-trained models in modern vision pipelines. 2. The authors provide a large-scale, systematic empirical analysis across numerous backbone configurations, datasets, and tuning modes. 3. The experimental observations (e.g., backbone access ≈ full-knowledge access) are clearly presented and supported by data.
1. The proposed backbone attack is essentially a simplified variant of standard PGD that maximizes cosine distance in the feature space. While the systematic evaluation is valuable, the technical contribution is modest—no new attack or defense mechanism is introduced. The paper’s main strength lies in empirical observations rather than algorithmic innovation. To improve impact, the authors could propose a new attack/defense method motivated by the findings (e.g., a method exploiting or mitigatin
This paper provides a systemactic evaluation on the model hyperparameters in a finer perspective, including weights, trainind method, datasets, etc. These works validate some established consensus regarding adversarial attacks in an experimental way.
The paper suffers from several major weaknesses, which can be categorized into 3 aspects: **Contribution**, **Experiments**, and **Writing**. 1. **Contribution**. The research contribution of this work is severely limited. Specifically, the method proposed in this paper, i.e., backbone attack, has no methodological innovation as it simply uses PGD to minimize the cosine similarity between adversarial and original examples. Apart from the method, the innovation brought by this paper comes from t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacillus and Francisella bacterial research
