HoneyModels: Machine Learning Honeypots
Ahmed Abdou, Ryan Sheatsley, Yohan Beugin, Tyler Shipp, Patrick, McDaniel

TL;DR
HoneyModels introduces a watermarked honeypot approach for neural networks that detects adversarial attacks with high accuracy, offering a scalable and practical alternative to traditional defenses in adversarial machine learning.
Contribution
The paper proposes HoneyModels, a novel honeypot-based method embedding watermarks in models to detect adversaries, addressing scalability and practicality issues of existing defenses.
Findings
Detects 69.5% of adversarial attacks
Preserves original model functionality
Encourages creation of watermarked adversarial samples
Abstract
Machine Learning is becoming a pivotal aspect of many systems today, offering newfound performance on classification and prediction tasks, but this rapid integration also comes with new unforeseen vulnerabilities. To harden these systems the ever-growing field of Adversarial Machine Learning has proposed new attack and defense mechanisms. However, a great asymmetry exists as these defensive methods can only provide security to certain models and lack scalability, computational efficiency, and practicality due to overly restrictive constraints. Moreover, newly introduced attacks can easily bypass defensive strategies by making subtle alterations. In this paper, we study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. When an adversary initiates an interaction with our model, attacks are encouraged to add…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
