High Performance Data Engineering Everywhere
Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Supun, Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi, Wickramasinghe, Ahmet Uyar, Gurhan Gunduz, and Geoffrey Fox

TL;DR
Cylon is a high-performance, distributed data processing library that integrates seamlessly with existing Big Data and AI frameworks, improving efficiency and scalability across multiple languages and platforms.
Contribution
The paper introduces Cylon, a flexible, high-performance distributed data processing library with a C++ core, supporting multiple languages and enhancing existing data analytics tools.
Findings
Cylon significantly improves performance of key operations in Spark and Dask.
Cylon enables seamless integration with AI tools like PyTorch and TensorFlow.
Cylon operates efficiently across multiple platforms with minimal overhead.
Abstract
The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide. However this is just a small part of the issues facing the overall data processing environment, which must also support a raft of data engineering for pre- and post-data processing, communication, and system integration. An important requirement of data analytics tools is to be able to easily integrate with existing frameworks in a multitude of languages, thereby increasing user productivity and efficiency. All this demands an efficient and highly distributed integrated approach for data processing, yet many of today's popular data analytics tools are unable to satisfy all these requirements at the same time. In this paper we present Cylon, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
