Attack on Unfair ToS Clause Detection: A Case Study using Universal   Adversarial Triggers

Shanshan Xu; Irina Broda; Rashid Haddad; Marco Negrini and; Matthias Grabmair

arXiv:2211.15556·cs.CL·November 29, 2022

Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers

Shanshan Xu, Irina Broda, Rashid Haddad, Marco Negrini and, Matthias Grabmair

PDF

Open Access

TL;DR

This paper reveals that transformer-based systems for detecting unfair ToS clauses are vulnerable to universal adversarial triggers, which can significantly impair detection performance while remaining natural and hard to detect by humans.

Contribution

It introduces a novel adversarial attack method using universal triggers on ToS clause detection systems and evaluates their effectiveness and human detectability.

Findings

01

Universal triggers can reduce detection accuracy significantly.

02

Naturalness of triggers influences their success in fooling humans.

03

Human evaluation shows triggers are often perceived as natural and unnoticeable.

Abstract

Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Cybercrime and Law Enforcement Studies · Spam and Phishing Detection

Methodstravel james