The Limitations of Stylometry for Detecting Machine-Generated Fake News

Tal Schuster; Roei Schuster; Darsh J Shah; Regina Barzilay

arXiv:1908.09805·cs.CL·February 21, 2020

The Limitations of Stylometry for Detecting Machine-Generated Fake News

Tal Schuster, Roei Schuster, Darsh J Shah, Regina Barzilay

PDF

TL;DR

This paper demonstrates that stylometry techniques are ineffective for detecting machine-generated fake news because language models produce stylistically consistent text regardless of intent, highlighting the need for alternative detection methods.

Contribution

The study shows the limitations of stylometry in distinguishing malicious from legitimate machine-generated text and introduces benchmarks illustrating stylistic similarities across different LM applications.

Findings

01

Stylometry fails to differentiate malicious from legitimate LM-generated content.

02

Humans alter style when deceived, but LMs do not.

03

New benchmarks demonstrate stylistic consistency in various LM uses.

Abstract

Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake news by capturing their stylistic differences from human-written text. These approaches, broadly termed stylometry, have found success in source attribution and misinformation detection in human-written texts. However, in this work, we show that stylometry is limited against machine-generated misinformation. While humans speak differently when trying to deceive, LMs generate stylistically consistent text, regardless of underlying motive. Thus, though stylometry can successfully prevent impersonation by identifying text provenance, it fails to distinguish legitimate LM applications from those that introduce false information. We create two benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.