A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data   Mining

Ernest Fokoue

arXiv:1501.00604·stat.ML·January 6, 2015·5 cites

A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

Ernest Fokoue

PDF

Open Access

TL;DR

This paper proposes a taxonomy of big data based on input dimensionality and sample size, guiding the selection of machine learning tools for efficient analysis across various data categories.

Contribution

It introduces a taxonomy of big data types and discusses tailored machine learning techniques, emphasizing the importance of data characteristics in method selection.

Findings

01

Different data categories require specific preprocessing tools

02

No single method outperforms others across all big data types

03

Simplicity often yields better results in massive data analysis

Abstract

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Neural Networks and Applications