Two-in-One: A Model Hijacking Attack Against Text Generation Models

Wai Man Si; Michael Backes; Yang Zhang; Ahmed Salem

arXiv:2305.07406·cs.CR·May 15, 2023·2 cites

Two-in-One: A Model Hijacking Attack Against Text Generation Models

Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem

PDF

Open Access

TL;DR

This paper introduces Ditto, a novel model hijacking attack that extends to text generation and classification models, demonstrating its effectiveness across various NLP tasks without compromising model utility.

Contribution

It broadens the scope of model hijacking attacks to include text generation, showing their applicability beyond image classification.

Findings

01

Ditto successfully hijacks text models across multiple datasets.

02

Hijacked models retain their utility after attack.

03

The attack demonstrates broad applicability to NLP tasks.

Abstract

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection