WikiReddit: Tracing Information and Attention Flows Between Online Platforms
Patrick Gildersleve, Anna Beers, Viviane Ito, Agustin Orozco,, Francesca Tripodi

TL;DR
This paper introduces a comprehensive multilingual dataset linking Reddit discussions to Wikipedia mentions from 2020 to 2023, enabling analysis of information and attention flows across these platforms.
Contribution
It provides a novel, enriched dataset capturing cross-platform mentions and links, facilitating research on information circulation and influence between Reddit and Wikipedia.
Findings
Reddit discussions significantly influence Wikipedia content updates.
The dataset enables detailed analysis of information flow patterns.
Wikipedia links in Reddit posts reflect deliberation and fact-checking behaviors.
Abstract
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia mentions and links shared in posts and comments on Reddit 2020-2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Spam and Phishing Detection · Access Control and Trust
