Statistical Validity and Consistency of Big Data Analytics: A General Framework
Bikram Karmakar, Indranil Mukhopadhyay

TL;DR
This paper introduces a comprehensive statistical framework and algorithmic principles to ensure the validity and consistency of Big Data analytics, addressing unique challenges posed by large-scale, complex data.
Contribution
It proposes a general framework and partition-repetition approach tailored for Big Data, enhancing statistical accuracy and consistency in data analysis.
Findings
Framework ensures statistical validity of Big Data conclusions
Partition-repetition approach applicable to diverse data problems
Potential to advance Big Data analytics methodology
Abstract
Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making inferences from it. Although storage, retrieval and management of Big Data seem possible through efficient algorithm and system development, concern about statistical consistency remains to be addressed in view of its specific characteristics. Since Big Data does not conform to standard analytics, we need proper modification of the existing statistical theory and tools. Here we propose, with illustrations, a general statistical framework and an algorithmic principle for Big Data analytics that ensure statistical accuracy of the conclusions. The proposed framework has the potential to push forward advancement of Big Data analytics in the right direction. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Data Mining Algorithms and Applications
