Fake News Detection After LLM Laundering: Measurement and Explanation
Rupak Kumar Das, Jonathan Dodge

TL;DR
This paper evaluates the challenges of detecting LLM-generated fake news, especially after paraphrasing, revealing detection difficulties, model-specific strengths, and the impact of sentiment shifts on detection success.
Contribution
It provides a comprehensive analysis of LLM fake news detection, introduces datasets with paraphrased samples, and explains detection failures through sentiment analysis.
Findings
Detectors struggle more with LLM-paraphrased fake news than human-written.
Certain models excel at evading detection or paraphrasing for semantic similarity.
Sentiment shift is a key factor in detection failures.
Abstract
With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · FinTech, Crowdfunding, Digital Finance
MethodsLocal Interpretable Model-Agnostic Explanations
