Same Pipeline, Opposite Conclusions: Sample-Surface Effects in Breaking-News Latency
Farhad Bazyari, Xianghang Liu, Sean Moran

TL;DR
This paper examines how sample selection impacts conclusions about social media platforms' timeliness in breaking news, revealing that results depend heavily on the sampling method used.
Contribution
It introduces a cross-surface sampling design that uncovers sample dependency effects in measuring news latency across platforms.
Findings
Twitter was the fastest source in 2014, but results vary with sampling.
Channel diversity has increased, with new platforms contributing to early news detection.
Coverage gaps exist, with 24% of events missing on-topic evidence even after filtering.
Abstract
Osborne and Dredze (2014) reported that Twitter was the timeliest social-media source of breaking news, trailing only newswire. Twelve years on, the platform landscape has shifted - Google+ is gone, X replaced Twitter, Bluesky and Threads have appeared - and platform data now flows almost exclusively through commercial social-listening providers that redact key fields. We revisit the question with two sampling designs run through the same downstream pipeline. Sample A draws N = 50 events from the Wikipedia Current Events Portal (WCEP) ranked by article pageviews. Sample B draws N = 109 events from Polymarket prediction markets ranked by USD trading volume, with each event's news moment pinned to the largest 1-hour trade-volume spike. Both samples are pulled from one commercial provider across nine indexed channels. We report three findings. (1) The X-vs-news direction depends on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
