Large-Scale Learning from Data Streams with Apache SAMOA

Nicolas Kourtellis; Gianmarco De Francisci Morales; and Albert Bifet

arXiv:1805.11477·cs.DC·May 30, 2018

Large-Scale Learning from Data Streams with Apache SAMOA

Nicolas Kourtellis, Gianmarco De Francisci Morales, and Albert Bifet

PDF

Open Access

TL;DR

Apache SAMOA is an open-source platform that enables scalable, distributed data stream mining for big data, supporting various algorithms and compatible with multiple stream processing engines.

Contribution

It introduces a flexible, pluggable architecture for distributed streaming algorithms, facilitating large-scale data mining from data streams.

Findings

01

Supports classification, clustering, regression tasks

02

Compatible with Apache Flink, Storm, Samza

03

Open-source and extensible platform

Abstract

Apache SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at https://samoa.incubator.apache.org under the Apache Software License version 2.0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Air Quality Monitoring and Forecasting