Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text
Rishi Balakrishnan, Stephen Sloan, Anil Aswani

TL;DR
This paper introduces a GAN-based method for anonymizing text to protect author identity, balancing anonymity, fluency, and content preservation, and demonstrating strong generalization to unseen authors.
Contribution
A novel GAN framework for authorship anonymization that outperforms existing methods in anonymization while maintaining content and fluency, and generalizes to new authors.
Findings
Outperforms baselines in anonymization effectiveness
Maintains content and fluency comparable to existing methods
Generalizes well to unseen authors
Abstract
With Internet users constantly leaving a trail of text, whether through blogs, emails, or social media posts, the ability to write and protest anonymously is being eroded because artificial intelligence, when given a sample of previous work, can match text with its author out of hundreds of possible candidates. Existing approaches to authorship anonymization, also known as authorship obfuscation, often focus on protecting binary demographic attributes rather than identity as a whole. Even those that do focus on obfuscating identity require manual feedback, lose the coherence of the original sentence, or only perform well given a limited subset of authors. In this paper, we develop a new approach to authorship anonymization by constructing a generative adversarial network that protects identity and optimizes for three different losses corresponding to anonymity, fluency, and content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Topic Modeling
