Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System
Julian Collado, Kevin Stangl

TL;DR
This paper presents a novel adversarial attack method targeting multi-model machine learning systems, effective even with limited knowledge of the system's components, outperforming previous approaches in success rate and perturbation size.
Contribution
Introduces the first attack specifically designed for multi-model systems with partial knowledge, achieving higher success rates and smaller perturbations than prior methods.
Findings
Attack success rate of 80% compared to 25% of previous methods.
Contains 9.4% smaller perturbations (MSE) than prior state-of-the-art.
Effective on supervised image pipelines and potentially generalizable to other multi-model systems.
Abstract
Recent approaches in machine learning often solve a task using a composition of multiple models or agentic architectures. When targeting a composed system with adversarial attacks, it might not be computationally or informationally feasible to train an end-to-end proxy model or a proxy model for every component of the system. We introduce a method to craft an adversarial attack against the overall multi-model system when we only have a proxy model for the final black-box model, and when the transformation applied by the initial models can make the adversarial perturbations ineffective. Current methods handle this by applying many copies of the first model/transformation to an input and then re-use a standard adversarial attack by averaging gradients, or learning a proxy model for both stages. To our knowledge, this is the first attack specifically designed for this threat model and our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning
MethodsFocus
