Statistical Validity and Consistency of Big Data Analytics: A General   Framework

Bikram Karmakar; Indranil Mukhopadhyay

arXiv:1803.10901·cs.DB·March 30, 2018·1 cites

Statistical Validity and Consistency of Big Data Analytics: A General Framework

Bikram Karmakar, Indranil Mukhopadhyay

PDF

Open Access

TL;DR

This paper introduces a comprehensive statistical framework and algorithmic principles to ensure the validity and consistency of Big Data analytics, addressing unique challenges posed by large-scale, complex data.

Contribution

It proposes a general framework and partition-repetition approach tailored for Big Data, enhancing statistical accuracy and consistency in data analysis.

Findings

01

Framework ensures statistical validity of Big Data conclusions

02

Partition-repetition approach applicable to diverse data problems

03

Potential to advance Big Data analytics methodology

Abstract

Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making inferences from it. Although storage, retrieval and management of Big Data seem possible through efficient algorithm and system development, concern about statistical consistency remains to be addressed in view of its specific characteristics. Since Big Data does not conform to standard analytics, we need proper modification of the existing statistical theory and tools. Here we propose, with illustrations, a general statistical framework and an algorithmic principle for Big Data analytics that ensure statistical accuracy of the conclusions. The proposed framework has the potential to push forward advancement of Big Data analytics in the right direction. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Data Mining Algorithms and Applications