Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset
S Mahmudul Hasan, Shaily Roy, Akib Jawad Nafis

TL;DR
This study systematically evaluates machine learning models on the LIAR dataset, revealing a performance ceiling and significant generalization gaps, emphasizing the need for external knowledge in political fake news detection.
Contribution
It provides a diagnostic analysis showing that increasing model complexity alone does not improve political fake news detection, highlighting the importance of external knowledge integration.
Findings
Models hit a performance ceiling with F1-score around 0.32.
Simple linear SVM matches transformer performance, indicating limited gains from complexity.
Tree-based models overfit training data and fail to generalize, relying on lexical memorization.
Abstract
The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Media Influence and Politics · Benford’s Law and Fraud Detection
