Confusing and Detecting ML Adversarial Attacks with Injected Attractors

Jiyi Zhang; Ee-Chien Chang; Hwee Kuan Lee

arXiv:2003.02732·cs.CR·March 9, 2021·1 cites

Confusing and Detecting ML Adversarial Attacks with Injected Attractors

Jiyi Zhang, Ee-Chien Chang, Hwee Kuan Lee

PDF

Open Access

TL;DR

This paper introduces a novel proactive defense mechanism against ML adversarial attacks by injecting attractors into models, which misleads attackers and enhances detection, showing significant reduction in attack success rates.

Contribution

The paper proposes a generic method to inject attractors from watermarking schemes into models to confuse and detect adversarial attacks, improving robustness and explainability.

Findings

01

Reduces attack success rate on CIFAR-10 to 1.9%

02

Leverages watermarking for scalable attractor injection

03

Outperforms existing defenses like LID, FS, MagNet

Abstract

Many machine learning adversarial attacks find adversarial samples of a victim model $M$ by following the gradient of some attack objective functions, either explicitly or implicitly. To confuse and detect such attacks, we take the proactive approach that modifies those functions with the goal of misleading the attacks to some local minimals, or to some designated regions that can be easily picked up by an analyzer. To achieve this goal, we propose adding a large number of artifacts, which we called $a tt r a c t or s$ , onto the otherwise smooth function. An attractor is a point in the input space, where samples in its neighborhood have gradient pointing toward it. We observe that decoders of watermarking schemes exhibit properties of attractors and give a generic method that injects attractors from a watermark decoder into the victim model $M$ . This principled approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis