Scaling Laws for Black box Adversarial Attacks
Chuan Liu, Huanran Chen, Yichi Zhang, Jun Zhu, Yinpeng Dong

TL;DR
This paper uncovers a universal log-linear scaling law for black-box adversarial attack success rates, demonstrating that increasing ensemble size significantly enhances attack effectiveness across various models and defenses.
Contribution
It introduces the first large-scale empirical study revealing a fundamental scaling law for ensemble-based black-box attacks, supported by theoretical analysis and extensive experiments.
Findings
Attack success rate scales linearly with the logarithm of ensemble size
Scaling improves transferability across classifiers, defenses, and MLLMs
Achieves over 80% success on proprietary models like GPT-4o
Abstract
Adversarial examples exhibit cross-model transferability, enabling threatening black-box attacks on commercial models. Model ensembling, which attacks multiple surrogate models, is a known strategy to improve this transferability. However, prior studies typically use small, fixed ensembles, which leaves open an intriguing question of whether scaling the number of surrogate models can further improve black-box attacks. In this work, we conduct the first large-scale empirical study of this question. We show that by resolving gradient conflict with advanced optimizers, we discover a robust and universal log-linear scaling law through both theoretical analysis and empirical evaluations: the Attack Success Rate (ASR) scales linearly with the logarithm of the ensemble size . We rigorously verify this law across standard classifiers, SOTA defenses, and MLLMs, and find that scaling distills…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptographic Implementations and Security · Physical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning
MethodsADaptive gradient method with the OPTimal convergence rate
