Enhancing Big Data in the Social Sciences with Crowdsourcing: Data   Augmentation Practices, Techniques, and Opportunities

Nathaniel D. Porter; Ashton M. Verdery; S. Michael Gaddis

arXiv:1609.08437·cs.CY·June 12, 2020

Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities

Nathaniel D. Porter, Ashton M. Verdery, S. Michael Gaddis

PDF

TL;DR

This paper explores how crowdsourcing can improve the quality and contextual richness of big data in social sciences through practical data augmentation methods, demonstrating its potential to increase data validity and acceptance.

Contribution

It introduces a novel approach of using online crowdsourcing for data augmentation in social science big data, with empirical cases illustrating its effectiveness and guidelines for best practices.

Findings

01

Crowdsourcing can verify automated coding tasks effectively.

02

It enables linking online data to structured databases.

03

Crowdsourcing gathers valuable contextual information for social science data.

Abstract

The importance of big data is a contested topic among social scientists. Proponents claim it will fuel a research revolution, but skeptics challenge it as unreliably measured and decontextualized, with limited utility for accurately answering social science research questions. We argue that social scientists need effective tools to quantify big data's measurement error and expand the contextual information associated with it. Standard research efforts in many fields already pursue these goals through data augmentation, the systematic assessment of measurement against known quantities and expansion of extant data by adding new information. Traditionally, these tasks are accomplished using trained research assistants or specialized algorithms. However, such approaches may not be scalable to big data or appease its skeptics. We consider a third alternative that may increase the validity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.