A Fast, Scalable, Universal Approach For Distributed Data Aggregations
Niranda Perera, Vibhatha Abeykoon, Chathura Widanage, Supun, Kamburugamuve, Thejaka Amila Kanewala, Pulasthi Wickramasinghe, Ahmet Uyar,, Hasara Maithree, Damitha Lenadora, and Geoffrey Fox

TL;DR
This paper introduces Cylon, a fast and scalable distributed data aggregation framework designed to seamlessly integrate with existing data analytics tools, enhancing efficiency in handling large-scale data processing tasks.
Contribution
The paper presents Cylon, a universal, high-performance aggregation system built on distributed in-memory tables, addressing the need for integrated data analytics solutions.
Findings
Cylon achieves high scalability on large datasets.
It integrates seamlessly with existing frameworks.
It improves data aggregation efficiency.
Abstract
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
