A Fast, Scalable, Universal Approach For Distributed Data Aggregations

Niranda Perera; Vibhatha Abeykoon; Chathura Widanage; Supun; Kamburugamuve; Thejaka Amila Kanewala; Pulasthi Wickramasinghe; Ahmet Uyar,; Hasara Maithree; Damitha Lenadora; and Geoffrey Fox

arXiv:2010.14596·cs.DC·December 16, 2020

A Fast, Scalable, Universal Approach For Distributed Data Aggregations

Niranda Perera, Vibhatha Abeykoon, Chathura Widanage, Supun, Kamburugamuve, Thejaka Amila Kanewala, Pulasthi Wickramasinghe, Ahmet Uyar,, Hasara Maithree, Damitha Lenadora, and Geoffrey Fox

PDF

TL;DR

This paper introduces Cylon, a fast and scalable distributed data aggregation framework designed to seamlessly integrate with existing data analytics tools, enhancing efficiency in handling large-scale data processing tasks.

Contribution

The paper presents Cylon, a universal, high-performance aggregation system built on distributed in-memory tables, addressing the need for integrated data analytics solutions.

Findings

01

Cylon achieves high scalability on large datasets.

02

It integrates seamlessly with existing frameworks.

03

It improves data aggregation efficiency.

Abstract

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.