Experimental Design Issues in Big Data. The Question of Bias
Elena Pesce, Eva Riccomagno, Henry P. Wynn

TL;DR
This paper discusses the challenges of bias and confounding in big data collection, especially from passive sources like social media, and reviews solutions such as randomization to address these issues.
Contribution
It highlights the specific issues of bias in big data and evaluates methods like randomization to mitigate these problems in causal inference.
Findings
Bias and confounders can distort causal analysis in big data.
Randomization and other methods can help reduce bias.
Passive data collection poses unique challenges for causal studies.
Abstract
Data can be collected in scientific studies via a controlled experiment or passive observation. Big data is often collected in a passive way, e.g. from social media. In studies of causation great efforts are made to guard against bias and hidden confounders or feedback which can destroy the identification of causation by corrupting or omitting counterfactuals (controls). Various solutions of these problems are discussed, including randomization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
