Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset

S Mahmudul Hasan; Shaily Roy; Akib Jawad Nafis

arXiv:2512.18533·cs.CL·December 23, 2025

Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset

S Mahmudul Hasan, Shaily Roy, Akib Jawad Nafis

PDF

Open Access

TL;DR

This study systematically evaluates machine learning models on the LIAR dataset, revealing a performance ceiling and significant generalization gaps, emphasizing the need for external knowledge in political fake news detection.

Contribution

It provides a diagnostic analysis showing that increasing model complexity alone does not improve political fake news detection, highlighting the importance of external knowledge integration.

Findings

01

Models hit a performance ceiling with F1-score around 0.32.

02

Simple linear SVM matches transformer performance, indicating limited gains from complexity.

03

Tree-based models overfit training data and fail to generalize, relying on lexical memorization.

Abstract

The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Media Influence and Politics · Benford’s Law and Fraud Detection