A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal   Trigger's Adversarial Attacks

Thai Le; Noseong Park; Dongwon Lee

arXiv:2011.10492·cs.CR·May 10, 2021

A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger's Adversarial Attacks

Thai Le, Noseong Park, Dongwon Lee

PDF

Open Access

TL;DR

This paper introduces DARCY, a honeypot-based defense framework that effectively detects and mitigates Universal Trigger adversarial attacks on textual neural networks, maintaining high accuracy and robustness across various scenarios.

Contribution

DARCY is a novel honeypot-inspired method that injects trapdoors into models to detect and defend against UniTrigger attacks, demonstrating high detection rates and robustness.

Findings

01

Detects UniTrigger attacks with up to 99% TPR and less than 2% FPR

02

Maintains prediction accuracy within 1% margin for clean inputs

03

Robust against diverse attack scenarios with varying attacker knowledge

Abstract

The Universal Trigger (UniTrigger) is a recently-proposed powerful adversarial textual attack method. Utilizing a learning-based mechanism, UniTrigger generates a fixed phrase that, when added to any benign inputs, can drop the prediction accuracy of a textual neural network (NN) model to near zero on a target class. To defend against this attack that can cause significant harm, in this paper, we borrow the "honeypot" concept from the cybersecurity community and propose DARCY, a honeypot-based defense framework against UniTrigger. DARCY greedily searches and injects multiple trapdoors into an NN model to "bait and catch" potential attacks. Through comprehensive experiments across four public datasets, we show that DARCY detects UniTrigger's adversarial attacks with up to 99% TPR and less than 2% FPR in most cases, while maintaining the prediction accuracy (in F1) for clean inputs within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Terrorism, Counterterrorism, and Political Violence · Advanced Malware Detection Techniques