Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Herun Wan, Jiaying Wu, Minnan Luo, Zhi Zeng, Zhixiong Su

TL;DR
This paper introduces TruthOverTricks, a framework for measuring shortcut learning in misinformation detection, and proposes SMF, an LLM-based data augmentation method to improve model robustness against superficial cues.
Contribution
It presents a new evaluation paradigm for shortcut behaviors in misinformation detection and a novel augmentation technique to reduce shortcut reliance in models.
Findings
Existing detectors perform poorly against shortcuts.
SMF improves robustness across multiple benchmarks.
Resources are publicly available for further research.
Abstract
Misinformation detection models often rely on superficial cues (i.e., \emph{shortcuts}) that correlate with misinformation in training data but fail to generalize to the diverse and evolving nature of real-world misinformation. This issue is exacerbated by large language models (LLMs), which can easily generate convincing misinformation through simple prompts. We introduce TruthOverTricks, a unified evaluation paradigm for measuring shortcut learning in misinformation detection. TruthOverTricks categorizes shortcut behaviors into intrinsic shortcut induction and extrinsic shortcut injection, and evaluates seven representative detectors across 14 popular benchmarks, along with two new factual misinformation datasets, NQ-Misinfo and Streaming-Misinfo. Empirical results reveal that existing detectors suffer severe performance degradation when exposed to both naturally occurring and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection
