A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Ernest Fokoue

TL;DR
This paper proposes a taxonomy of big data based on input dimensionality and sample size, guiding the selection of machine learning tools for efficient analysis across various data categories.
Contribution
It introduces a taxonomy of big data types and discusses tailored machine learning techniques, emphasizing the importance of data characteristics in method selection.
Findings
Different data categories require specific preprocessing tools
No single method outperforms others across all big data types
Simplicity often yields better results in massive data analysis
Abstract
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Neural Networks and Applications
