HPTMT Parallel Operators for High Performance Data Science & Data Engineering
Vibhatha Abeykoon, Supun Kamburugamuve, Chathura Widanage, Niranda, Perera, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, and, Geoffrey Fox

TL;DR
This paper introduces the HPTMT architecture, a unified framework of data structures and operators designed to enhance the efficiency and interoperability of data science and engineering applications.
Contribution
It proposes a comprehensive architecture that unifies data structures and operators across data engineering and data science, enabling more efficient and integrated applications.
Findings
Demonstrates the architecture with an end-to-end application
Shows improved efficiency in data processing and integration
Provides a clear framework linking data engineering and science
Abstract
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
