When is it Biased? Assessing the Representativeness of Twitter's   Streaming API

Fred Morstatter; J\"urgen Pfeffer; Huan Liu

arXiv:1401.7909·cs.SI·January 31, 2014·46 cites

When is it Biased? Assessing the Representativeness of Twitter's Streaming API

Fred Morstatter, J\"urgen Pfeffer, Huan Liu

PDF

Open Access

TL;DR

This paper investigates the bias in Twitter's Streaming API data by comparing hashtag trends with true activity, proposing a method to detect bias using open data sources without needing the Firehose, and evaluating its effectiveness in various scenarios.

Contribution

It introduces a new approach to identify bias in Twitter Streaming API data without relying on the Firehose, using open data sources to compare hashtag trends.

Findings

01

Effective detection of bias in Streaming API data

02

Method works in sparse data situations

03

Applicable across different regions and queries

Abstract

Twitter has captured the interest of the scientific community not only for its massive user base and content, but also for its openness in sharing its data. Twitter shares a free 1% sample of its tweets through the "Streaming API", a service that returns a sample of tweets according to a set of parameters set by the researcher. Recently, research has pointed to evidence of bias in the data returned through the Streaming API, raising concern in the integrity of this data service for use in research scenarios. While these results are important, the methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. In this work we tackle the problem of finding sample bias without the need for "gold standard" Firehose data. Namely, we focus on finding time periods in the Streaming API data where the trend of a hashtag is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Complex Network Analysis Techniques · Data-Driven Disease Surveillance