TESSER: Transfer-Enhancing Adversarial Attacks from Vision Transformers via Spectral and Semantic Regularization
Amira Guesmi, Bassem Ouni, Muhammad Shafique

TL;DR
TESSER is a novel adversarial attack framework that improves transferability across vision models by combining feature-sensitive gradient scaling and spectral regularization, leading to more effective black-box attacks.
Contribution
It introduces a new attack method that enhances transferability using gradient modulation and spectral smoothing, outperforming existing techniques on diverse architectures.
Findings
Achieves +10.9% higher attack success rate on CNNs
Achieves +7.2% higher attack success rate on ViTs
Reduces high-frequency noise in perturbations by 12%
Abstract
Adversarial transferability remains a critical challenge in evaluating the robustness of deep neural networks. In security-critical applications, transferability enables black-box attacks without access to model internals, making it a key concern for real-world adversarial threat assessment. While Vision Transformers (ViTs) have demonstrated strong adversarial performance, existing attacks often fail to transfer effectively across architectures, especially from ViTs to Convolutional Neural Networks (CNNs) or hybrid models. In this paper, we introduce \textbf{TESSER} -- a novel adversarial attack framework that enhances transferability via two key strategies: (1) \textit{Feature-Sensitive Gradient Scaling (FSGS)}, which modulates gradients based on token-wise importance derived from intermediate feature activations, and (2) \textit{Spectral Smoothness Regularization (SSR)}, which…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper introduces Feature-Sensitive Gradient Scaling (FSGS), which steers perturbations toward semantically meaningful features to improve cross-architecture generalization. 2. This paper further proposes Spectral Smoothness Regularization (SSR) to encourage smoother and low-frequency perturbations that exhibit greater robustness across different architectures. 3. The proposed TESSER methodology demonstrates strong attack transferability across diverse target models.
1. The novelty and justification of FSGS and SSR are insufficiently supported. FSGS builds on the assumption that high-norm tokens carry richer semantic information, yet its claim that high-norm tokens in shallow ViT layers represent noisy signals lacks empirical or theoretical validation. Similarly, the SSR component (applying Gaussian smoothing to suppress high-frequency noise) closely resembles TI-FGSM [1], but the paper neither cites nor compares with this prior work. The rationale for selec
1. The core methodology is intuitive and well-motivated. Focusing the attack on "important" regions (via FSGS) while removing "model-specific" high-frequency noise (via SSR) is a logical approach. 2. The method shows strong performance against adversarially trained CNNs and robust ViTs. 3. The ablation in Table 4 clearly breaks down the contributions of applying FSGS to the Attention, QKV, and MLP modules, justifying the decision to combine all three.
1. The authors should supplement their baseline comparisons by benchmarking against the wider set of attacks available in the `TransferAttack` repository (https://github.com/Trustworthy-AI-Group/TransferAttack). 2. The setting for the perturbation budget (epsilon) is unusual. 3. The experimental design involves resizing inputs to different dimensions, but the potential effect of this `resize` operation on transferability is not discussed.
1. The work tackles a critical and open problem in adversarial robustness: improving the cross-architecture transferability of adversarial attacks, especially from ViTs to CNNs. 2. The experimental setup is thorough, evaluating the attack against 14 different models, which provides a strong basis for the empirical claims. 3. TESSER appears to outperform existing state-of-the-art baselines, including strong methods like ATT and DiffAttack, across nearly all tested scenarios. 4. The paper incl
The method's design is overly intuitive, and its conclusions require stronger proof. For instance, the paper claims that TESSER utilizes different frequency bands of noise information compared to previous transfer attacks. This claim should be substantiated, at a minimum, by designing experiments using low-pass or high-pass filters to verify this difference.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Anomaly Detection Techniques and Applications
