Classification of worldwide news articles by perceived quality, 2018-2024

Connor McElroy; Thiago E. A. de Oliveira; Chris Brogly

arXiv:2511.16416·cs.CL·November 21, 2025

Classification of worldwide news articles by perceived quality, 2018-2024

Connor McElroy, Thiago E. A. de Oliveira, Chris Brogly

PDF

Open Access

TL;DR

This study evaluates machine learning and deep learning models on a large dataset to classify news articles by perceived quality, demonstrating high accuracy especially with models like ModernBERT-large.

Contribution

It introduces a new dataset of over 1.4 million news articles with expert-rated quality labels and compares multiple models for quality classification.

Findings

01

Deep learning models outperform traditional classifiers in accuracy.

02

ModernBERT-large achieves the highest accuracy of 87.44%.

03

Traditional classifiers like Random Forest reach 73.55% accuracy.

Abstract

This study explored whether supervised machine learning and deep learning models can effectively distinguish perceived lower-quality news articles from perceived higher-quality news articles. 3 machine learning classifiers and 3 deep learning models were assessed using a newly created dataset of 1,412,272 English news articles from the Common Crawl over 2018-2024. Expert consensus ratings on 579 source websites were split at the median, creating perceived low and high-quality classes of about 706,000 articles each, with 194 linguistic features per website-level labelled article. Traditional machine learning classifiers such as the Random Forest demonstrated capable performance (0.7355 accuracy, 0.8131 ROC AUC). For deep learning, ModernBERT-large (256 context length) achieved the best performance (0.8744 accuracy; 0.9593 ROC-AUC; 0.8739 F1), followed by DistilBERT-base (512 context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining