Statistique et Big Data Analytics; Volum\'etrie, L'Attaque des Clones

Philippe Besse (IMT); Nathalie Villa-Vialaneix (MIAT INRA)

arXiv:1405.6676·stat.OT·October 7, 2014

Statistique et Big Data Analytics; Volum\'etrie, L'Attaque des Clones

Philippe Besse (IMT), Nathalie Villa-Vialaneix (MIAT INRA)

PDF

Open Access

TL;DR

This paper explores the skills statisticians need to handle big data, focusing on how traditional learning algorithms are adapted to the Map-Reduce framework in Hadoop environments.

Contribution

It provides an overview of strategies and algorithm adaptations necessary for statisticians to effectively analyze big data using Hadoop and Map-Reduce.

Findings

01

Algorithms are adapted for Map-Reduce to handle big data stresses

02

Overview of strategies for statisticians in big data environments

03

Discussion of algorithm performance in Hadoop context

Abstract

This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning (regression, CART, random forest). What skills and knowledge do a statistician must acquire to reach the "Volume" scale of big data? After a quick overview of the different strategies available and especially of those imposed by Hadoop, the algorithms of some available learning methods are outlined in order to understand how they are adapted to the strong stresses of the Map-Reduce functionalities

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Data Mining Algorithms and Applications