True or False: Does the Deep Learning Model Learn to Detect Rumors?
Shiwen Ni, Jiawen Li, and Hung-Yu Kao

TL;DR
This paper critically examines whether deep learning models genuinely learn to detect rumors or rely on shortcuts, revealing poor generalization and proposing a new paired test evaluation method.
Contribution
It introduces the PairT evaluation method and highlights dataset pitfalls, emphasizing the need for better dataset creation and evaluation practices in rumor detection.
Findings
Models have poor out-of-domain generalization.
Models often learn shortcuts and absurd knowledge.
Proposed PairT for more realistic evaluation.
Abstract
It is difficult for humans to distinguish the true and false of rumors, but current deep learning models can surpass humans and achieve excellent accuracy on many rumor datasets. In this paper, we investigate whether deep learning models that seem to perform well actually learn to detect rumors. We evaluate models on their generalization ability to out-of-domain examples by fine-tuning BERT-based models on five real-world datasets and evaluating against all test sets. The experimental results indicate that the generalization ability of the models on other unseen datasets are unsatisfactory, even common-sense rumors cannot be detected. Moreover, we found through experiments that models take shortcuts and learn absurd knowledge when the rumor datasets have serious data pitfalls. This means that simple modifications to the rumor text based on specific rules will lead to inconsistent model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
