An exploration of features to improve the generalisability of fake news detection models
Nathaniel Hoy, Theodora Koulouri

TL;DR
This paper investigates how stylistic and social-monetisation features can improve the generalisability of fake news detection models, addressing dataset bias issues and evaluating LLMs like LLaMa.
Contribution
It introduces novel social-monetisation features and demonstrates their effectiveness in enhancing fake news detection robustness over traditional token-based models.
Findings
Stylistic and social-monetisation features outperform token-based models in generalisability.
Token-based models are sensitive to dataset biases and perform poorly across datasets.
LLMs like LLaMa show limited effectiveness compared to feature-based approaches.
Abstract
Fake news poses global risks by influencing elections and spreading misinformation, making detection critical. Existing NLP and supervised Machine Learning methods perform well under cross-validation but struggle to generalise across datasets, even within the same domain. This issue stems from coarsely labelled training data, where articles are labelled based on their publisher, introducing biases that token-based models like TF-IDF and BERT are sensitive to. While Large Language Models (LLMs) offer promise, their application in fake news detection remains limited. This study demonstrates that meaningful features can still be extracted from coarsely labelled data to improve real-world robustness. Stylistic features-lexical, syntactic, and semantic-are explored due to their reduced sensitivity to dataset biases. Additionally, novel social-monetisation features are introduced, capturing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Softmax · Dropout · Weight Decay · Attention Dropout · Dense Connections · Linear Layer · Layer Normalization · Residual Connection
