Challenges of Big Data Analysis

Jianqing Fan; Fang Han; Han Liu

arXiv:1308.1479·stat.ML·December 16, 2014

Challenges of Big Data Analysis

Jianqing Fan, Fang Han, Han Liu

PDF

TL;DR

Big Data analysis offers new opportunities for uncovering complex patterns but also presents significant computational and statistical challenges that require innovative paradigms and careful methodological considerations.

Contribution

This paper provides an overview of Big Data features, discusses their impact on computational and statistical paradigms, and highlights the limitations of current methods due to incidental endogeneity.

Findings

01

High-dimensional data pose scalability challenges.

02

Incidental endogeneity can lead to invalid inferences.

03

Sparsest solutions in high-confidence sets are viable.

Abstract

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.