An exploration of features to improve the generalisability of fake news   detection models

Nathaniel Hoy; Theodora Koulouri

arXiv:2502.20299·cs.LG·February 28, 2025

An exploration of features to improve the generalisability of fake news detection models

Nathaniel Hoy, Theodora Koulouri

PDF

TL;DR

This paper investigates how stylistic and social-monetisation features can improve the generalisability of fake news detection models, addressing dataset bias issues and evaluating LLMs like LLaMa.

Contribution

It introduces novel social-monetisation features and demonstrates their effectiveness in enhancing fake news detection robustness over traditional token-based models.

Findings

01

Stylistic and social-monetisation features outperform token-based models in generalisability.

02

Token-based models are sensitive to dataset biases and perform poorly across datasets.

03

LLMs like LLaMa show limited effectiveness compared to feature-based approaches.

Abstract

Fake news poses global risks by influencing elections and spreading misinformation, making detection critical. Existing NLP and supervised Machine Learning methods perform well under cross-validation but struggle to generalise across datasets, even within the same domain. This issue stems from coarsely labelled training data, where articles are labelled based on their publisher, introducing biases that token-based models like TF-IDF and BERT are sensitive to. While Large Language Models (LLMs) offer promise, their application in fake news detection remains limited. This study demonstrates that meaningful features can still be extracted from coarsely labelled data to improve real-world robustness. Stylistic features-lexical, syntactic, and semantic-are explored due to their reduced sensitivity to dataset biases. Additionally, novel social-monetisation features are introduced, capturing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Softmax · Dropout · Weight Decay · Attention Dropout · Dense Connections · Linear Layer · Layer Normalization · Residual Connection