Model Mimic Attack: Knowledge Distillation for Provably Transferable   Adversarial Examples

Kirill Lukyanov; Andrew Perminov; Denis Turdakov; Mikhail Pautov

arXiv:2410.15889·cs.LG·October 22, 2024

Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples

Kirill Lukyanov, Andrew Perminov, Denis Turdakov, Mikhail Pautov

PDF

Open Access

TL;DR

This paper introduces a provably effective black-box adversarial attack method using knowledge distillation, which iteratively trains a surrogate model to reliably generate transferable adversarial examples against neural networks.

Contribution

It is the first to provide theoretical guarantees for the success of knowledge distillation-based adversarial attacks on neural networks.

Findings

01

Proves finite-step success guarantee for the attack.

02

Demonstrates efficiency over query-based methods.

03

Establishes theoretical foundations for transferability.

Abstract

The vulnerability of artificial neural networks to adversarial perturbations in the black-box setting is widely studied in the literature. The majority of attack methods to construct these perturbations suffer from an impractically large number of queries required to find an adversarial example. In this work, we focus on knowledge distillation as an approach to conduct transfer-based black-box adversarial attacks and propose an iterative training of the surrogate model on an expanding dataset. This work is the first, to our knowledge, to provide provable guarantees on the success of knowledge distillation-based attack on classification neural networks: we prove that if the student model has enough learning capabilities, the attack on the teacher model is guaranteed to be found within the finite number of distillation iterations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsKnowledge Distillation · Focus