Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks
Jiaying Wu, Jiafeng Guo, Bryan Hooi

TL;DR
This paper reveals that LLMs can craft style-mimicking fake news that defeats current detectors, and introduces SheepDog, a new style-robust fake news detector that emphasizes content over style for improved resilience.
Contribution
We propose SheepDog, a novel fake news detection method that is robust against style-based attacks by focusing on content and leveraging LLM-generated style variations during training.
Findings
SheepDog maintains high accuracy across style variations.
LLM-empowered style attacks reduce traditional detector performance by up to 38%.
SheepDog outperforms existing detectors in style robustness on real-world benchmarks.
Abstract
It is commonly perceived that fake news and real news exhibit distinct writing styles, such as the use of sensationalist versus objective language. However, we emphasize that style-related features can also be exploited for style-based attacks. Notably, the advent of powerful Large Language Models (LLMs) has empowered malicious actors to mimic the style of trustworthy news sources, doing so swiftly, cost-effectively, and at scale. Our analysis reveals that LLM-camouflaged fake news content significantly undermines the effectiveness of state-of-the-art text-based detectors (up to 38% decrease in F1 Score), implying a severe vulnerability to stylistic variations. To address this, we introduce SheepDog, a style-robust fake news detector that prioritizes content over style in determining news veracity. SheepDog achieves this resilience through (1) LLM-empowered news reframings that inject…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Sentiment Analysis and Opinion Mining
