Enhancing Big Data in the Social Sciences with Crowdsourcing: Data Augmentation Practices, Techniques, and Opportunities
Nathaniel D. Porter, Ashton M. Verdery, S. Michael Gaddis

TL;DR
This paper explores how crowdsourcing can improve the quality and contextual richness of big data in social sciences through practical data augmentation methods, demonstrating its potential to increase data validity and acceptance.
Contribution
It introduces a novel approach of using online crowdsourcing for data augmentation in social science big data, with empirical cases illustrating its effectiveness and guidelines for best practices.
Findings
Crowdsourcing can verify automated coding tasks effectively.
It enables linking online data to structured databases.
Crowdsourcing gathers valuable contextual information for social science data.
Abstract
The importance of big data is a contested topic among social scientists. Proponents claim it will fuel a research revolution, but skeptics challenge it as unreliably measured and decontextualized, with limited utility for accurately answering social science research questions. We argue that social scientists need effective tools to quantify big data's measurement error and expand the contextual information associated with it. Standard research efforts in many fields already pursue these goals through data augmentation, the systematic assessment of measurement against known quantities and expansion of extant data by adding new information. Traditionally, these tasks are accomplished using trained research assistants or specialized algorithms. However, such approaches may not be scalable to big data or appease its skeptics. We consider a third alternative that may increase the validity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
