HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks
Supun Kamburugamuve, Chathura Widanage, Niranda Perera, Vibhatha, Abeykoon, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, and, Geoffrey Fox

TL;DR
HPTMT introduces an operator-based architecture for scalable, high-performance data-intensive frameworks, integrating ideas from multiple systems and supporting various data abstractions and programming languages.
Contribution
The paper presents HPTMT, a novel operator-based architecture that unifies data abstractions and supports multiple languages for scalable high-performance data processing.
Findings
Demonstrates HPTMT's effectiveness with Cylon and Twister2 environments.
Shows improved performance and usability in data-intensive applications.
Highlights the importance of language-agnostic interoperability.
Abstract
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, graphs, and tables. Our key concepts are inspired from systems like MPI, HPF (High-Performance Fortran), NumPy, Pandas, Spark, Modin, PyTorch, TensorFlow, RAPIDS(NVIDIA), and OneAPI (Intel). Further, it is crucial to support different languages in everyday use in the Big Data arena, including Python, R, C++, and Java. We note the importance of Apache Arrow and Parquet for enabling language agnostic high performance and interoperability. In this paper, we propose High-Performance Tensors, Matrices and Tables (HPTMT), an operator-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
