SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural   Networks

Kiran Karra; Chace Ashcraft; Cash Costello

arXiv:2109.04566·cs.LG·June 3, 2022

SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks

Kiran Karra, Chace Ashcraft, Cash Costello

PDF

Open Access

TL;DR

This paper demonstrates that unsupervised data augmentation (UDA), a self-supervised learning technique, effectively mitigates backdoor and Trojan attacks in neural networks across various architectures and data scenarios.

Contribution

It introduces the use of UDA for backdoor removal, showing it outperforms existing methods in sanitizing Trojaned neural networks.

Findings

01

UDA more effective than state-of-the-art methods for Trojan removal

02

Works across multiple model architectures and Trojan types

03

Effective with varying amounts of data

Abstract

Self-supervised learning (SSL) methods have resulted in broad improvements to neural network performance by leveraging large, untapped collections of unlabeled data to learn generalized underlying structure. In this work, we harness unsupervised data augmentation (UDA), an SSL technique, to mitigate backdoor or Trojan attacks on deep neural networks. We show that UDA is more effective at removing trojans than current state-of-the-art methods for both feature space and point triggers, over a range of model architectures, trojans, and data quantities provided for trojan removal. These results demonstrate that UDA is both an effective and practical approach to mitigating the effects of backdoors on neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications