Media Cloud: Massive Open Source Collection of Global News on the Open Web
Hal Roberts, Rahul Bhargava, Linas Valiukas, Dennis Jen, Momin M., Malik, Cindy Bishop, Emily Ndulue, Aashka Dave, Justin Clark, Bruce Etling,, Rob Faris, Anushka Shah, Jasmin Rubinovitz, Alexis Hope, Catherine D'Ignazio,, Fernando Bermejo, Yochai Benkler, Ethan Zuckerman

TL;DR
Media Cloud is an open source platform that collects and analyzes global news data from the open web through hyperlink crawling, providing researchers with a valuable tool for studying the media ecosystem.
Contribution
This paper provides the first comprehensive description of Media Cloud, detailing its data collection, processing, open API, and how it enables custom dataset creation for media research.
Findings
Media Cloud has been operational for over 10 years.
It offers open API access and user tools for data analysis.
Sample datasets demonstrate its utility for media studies.
Abstract
We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and its open API access as well as user-facing tools. We also highlight the strengths and limitations of the Media Cloud collection strategy compared to relevant alternatives. We give an overview two sample datasets generated using Media Cloud and discuss how researchers can use the platform to create their own datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Peer-to-Peer Network Technologies · Web Data Mining and Analysis
