NewsTweet: A Dataset of Social Media Embedding in Online Journalism
Munif Ishad Mujib, Hunter Scott Heidenreich, Colin J. Murphy, Giovanni, C. Santia, Asta Zelenkauskaite, Jake Ryland Williams

TL;DR
This paper introduces NewsTweet, a large-scale dataset of social media embeddings in online journalism, along with a data collection pipeline, enabling new research into social content integration in news stories.
Contribution
It presents a novel large-scale dataset and data pipeline for studying embedded social media content in digital news articles.
Findings
13% of news stories include embedded tweets
Sports and entertainment news have the most embedded tweets
Public figures dominate embedded social media content
Abstract
The inclusion of social media posts---tweets, in particular---in digital news stories, both as commentary and increasingly as news sources, has become commonplace in recent years. In order to study this phenomenon with sufficient depth, robust large-scale data collection from both news publishers and social media platforms is necessary. This work describes the construction of such a data pipeline. In the data collected from Google News, 13% of all stories were found to include embedded tweets, with sports and entertainment news containing the largest volumes of them. Public figures and celebrities are found to dominate these stories; however, relatively unknown users have also been found to achieve newsworthiness. The collected data set, NewsTweet, and the associated pipeline for acquisition stand to engender a wave of new inquiries into social content embedding from multiple research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Media and Politics · Media Studies and Communication · Opinion Dynamics and Social Influence
