Identifying a Training-Set Attack's Target Using Renormalized Influence   Estimation

Zayd Hammoudeh; Daniel Lowd

arXiv:2201.10055·cs.LG·September 7, 2022

Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Zayd Hammoudeh, Daniel Lowd

PDF

Open Access 1 Repo

TL;DR

This paper introduces a renormalized influence estimation method for identifying whether a specific test instance is the target of a training-set attack, effectively detecting adversarial training instances across multiple data domains.

Contribution

The work develops renormalized influence estimators that outperform existing methods in identifying influential training instances, enabling effective target detection in adversarial settings.

Findings

01

Renormalized influence estimators outperform original estimators in identifying influential training groups.

02

Achieves up to 100% detection of adversarial training instances with no false positives on clean data.

03

Effective across text, vision, and speech data, even against adaptive attackers.

Abstract

Targeted training-set attacks inject malicious instances into the training set to cause a trained model to mislabel one or more specific test instances. This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack. Target identification can be combined with adversarial-instance identification to find (and remove) the attack instances, mitigating the attack with minimal impact on other predictions. Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction. We show that existing influence estimators' poor practical performance often derives from their over-reliance on training instances and iterations with large losses. Our renormalized influence estimators fix this weakness; they far…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zaydh/target_identification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection