TL;DR
This paper introduces a method for adversarial reprogramming of text classification neural networks by learning a vocabulary remapping, enabling repurposing for new tasks without modifying the original models.
Contribution
It presents a novel context-based vocabulary remapping approach for adversarial reprogramming of discrete-input text classifiers, with training procedures for white-box and black-box scenarios.
Findings
Successfully reprogrammed LSTM, bi-LSTM, and CNN models for new classification tasks.
Effective in both white-box and black-box attack settings.
Demonstrated applicability to various neural network architectures.
Abstract
Adversarial Reprogramming has demonstrated success in utilizing pre-trained neural network classifiers for alternative classification tasks without modification to the original network. An adversary in such an attack scenario trains an additive contribution to the inputs to repurpose the neural network for the new classification task. While this reprogramming approach works for neural networks with a continuous input space such as that of images, it is not directly applicable to neural networks trained for tasks such as text classification, where the input space is discrete. Repurposing such classification networks would require the attacker to learn an adversarial program that maps inputs from one discrete space to the other. In this work, we introduce a context-based vocabulary remapping model to reprogram neural networks trained on a specific sequence classification task, for a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
