Models for Predicting Community-Specific Interest in News Articles
Benjamin D. Horne, William Dron, and Sibel Adali

TL;DR
This study develops content-based models to predict community interest in news articles on Reddit, demonstrating high accuracy but highlighting challenges with feature degradation over time and recommending hierarchical classification and retraining strategies.
Contribution
The paper introduces models that predict community-specific interest using only article content features and analyzes their temporal robustness and generalization challenges.
Findings
Models achieve ROC AUC between 0.81 and 1.0 in classifying community interest.
Feature groups degrade differently over time, affecting model performance.
Hierarchical classifiers and retraining are recommended for better long-term accuracy.
Abstract
In this work, we ask two questions: 1. Can we predict the type of community interested in a news article using only features from the article content? and 2. How well do these models generalize over time? To answer these questions, we compute well-studied content-based features on over 60K news articles from 4 communities on reddit.com. We train and test models over three different time periods between 2015 and 2017 to demonstrate which features degrade in performance the most due to concept drift. Our models can classify news articles into communities with high accuracy, ranging from 0.81 ROC AUC to 1.0 ROC AUC. However, while we can predict the community-specific popularity of news articles with high accuracy, practitioners should approach these models carefully. Predictions are both community-pair dependent and feature group dependent. Moreover, these feature groups generalize over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Complex Network Analysis Techniques · Misinformation and Its Impacts
