Statistique et Big Data Analytics; Volum\'etrie, L'Attaque des Clones
Philippe Besse (IMT), Nathalie Villa-Vialaneix (MIAT INRA)

TL;DR
This paper explores the skills statisticians need to handle big data, focusing on how traditional learning algorithms are adapted to the Map-Reduce framework in Hadoop environments.
Contribution
It provides an overview of strategies and algorithm adaptations necessary for statisticians to effectively analyze big data using Hadoop and Map-Reduce.
Findings
Algorithms are adapted for Map-Reduce to handle big data stresses
Overview of strategies for statisticians in big data environments
Discussion of algorithm performance in Hadoop context
Abstract
This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning (regression, CART, random forest). What skills and knowledge do a statistician must acquire to reach the "Volume" scale of big data? After a quick overview of the different strategies available and especially of those imposed by Hadoop, the algorithms of some available learning methods are outlined in order to understand how they are adapted to the strong stresses of the Map-Reduce functionalities
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Data Mining Algorithms and Applications
