The Case for Being Average: A Mediocrity Approach to Style Masking and   Author Obfuscation

Georgi Karadjov; Tsvetomila Mihaylova; Yasen Kiprov; Georgi Georgiev,; Ivan Koychev; and Preslav Nakov

arXiv:1707.03736·cs.CL·July 31, 2017

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Georgi Karadjov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev,, Ivan Koychev, and Preslav Nakov

PDF

Open Access 2 Repos

TL;DR

This paper introduces a method for anonymizing text by adjusting stylometric features towards average values, effectively obscuring author identity while maintaining text semantics, and demonstrates its success in a competitive benchmark.

Contribution

The paper presents a novel stylometry-based approach for author obfuscation that balances style modification with semantic preservation, outperforming previous methods.

Findings

01

Achieved top performance in the PAN-2016 author obfuscation task.

02

Effectively reduces stylometric discriminability of texts.

03

Maintains semantic integrity after style adjustments.

Abstract

Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling