HoneyModels: Machine Learning Honeypots

Ahmed Abdou; Ryan Sheatsley; Yohan Beugin; Tyler Shipp; Patrick; McDaniel

arXiv:2202.10309·cs.CR·February 22, 2022

HoneyModels: Machine Learning Honeypots

Ahmed Abdou, Ryan Sheatsley, Yohan Beugin, Tyler Shipp, Patrick, McDaniel

PDF

TL;DR

HoneyModels introduces a watermarked honeypot approach for neural networks that detects adversarial attacks with high accuracy, offering a scalable and practical alternative to traditional defenses in adversarial machine learning.

Contribution

The paper proposes HoneyModels, a novel honeypot-based method embedding watermarks in models to detect adversaries, addressing scalability and practicality issues of existing defenses.

Findings

01

Detects 69.5% of adversarial attacks

02

Preserves original model functionality

03

Encourages creation of watermarked adversarial samples

Abstract

Machine Learning is becoming a pivotal aspect of many systems today, offering newfound performance on classification and prediction tasks, but this rapid integration also comes with new unforeseen vulnerabilities. To harden these systems the ever-growing field of Adversarial Machine Learning has proposed new attack and defense mechanisms. However, a great asymmetry exists as these defensive methods can only provide security to certain models and lack scalability, computational efficiency, and practicality due to overly restrictive constraints. Moreover, newly introduced attacks can easily bypass defensive strategies by making subtle alterations. In this paper, we study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. When an adversary initiates an interaction with our model, attacks are encouraged to add…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.