The Sensitivity of Word Embeddings-based Author Detection Models to   Semantic-preserving Adversarial Perturbations

Jeremiah Duncan; Fabian Fallas; Chris Gropp; Emily Herron; Maria; Mahbub; Paula Olaya; Eduardo Ponce; Tabitha K. Samuel; Daniel Schultz,; Sudarshan Srinivasan; Maofeng Tang; Viktor Zenkov; Quan Zhou; Edmon Begoli

arXiv:2102.11917·cs.CL·February 25, 2021

The Sensitivity of Word Embeddings-based Author Detection Models to Semantic-preserving Adversarial Perturbations

Jeremiah Duncan, Fabian Fallas, Chris Gropp, Emily Herron, Maria, Mahbub, Paula Olaya, Eduardo Ponce, Tabitha K. Samuel, Daniel Schultz,, Sudarshan Srinivasan, Maofeng Tang, Viktor Zenkov, Quan Zhou, Edmon Begoli

PDF

Open Access

TL;DR

This paper investigates how word embedding-based author detection models are affected by semantic-preserving adversarial perturbations, revealing their sensitivities and limitations in maintaining accuracy under input manipulations.

Contribution

It introduces an experimental framework to evaluate the robustness of authorship detection models against semantic-preserving adversarial attacks.

Findings

01

Detection performance drops significantly with certain perturbations

02

Model sensitivity varies based on input and configuration

03

Different perturbation strategies have distinct impacts on accuracy

Abstract

Authorship analysis is an important subject in the field of natural language processing. It allows the detection of the most likely writer of articles, news, books, or messages. This technique has multiple uses in tasks related to authorship attribution, detection of plagiarism, style analysis, sources of misinformation, etc. The focus of this paper is to explore the limitations and sensitiveness of established approaches to adversarial manipulations of inputs. To this end, and using those established techniques, we first developed an experimental frame-work for author detection and input perturbations. Next, we experimentally evaluated the performance of the authorship detection model to a collection of semantic-preserving adversarial perturbations of input narratives. Finally, we compare and analyze the effects of different perturbation strategies, input and model configurations, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Spam and Phishing Detection