Scalable Relational Query Processing on Big Matrix Data

Yongyang Yu; Mingjie Tang; Walid G. Aref

arXiv:2110.01767·cs.DB·November 10, 2021·1 cites

Scalable Relational Query Processing on Big Matrix Data

Yongyang Yu, Mingjie Tang, Walid G. Aref

PDF

Open Access 1 Repo

TL;DR

This paper introduces scalable relational query processing methods for large matrix data in distributed environments, significantly improving performance over existing systems by optimizing query plans and partitioning strategies.

Contribution

It develops novel algebraic transformations, a query optimizer, and partitioning schemes for efficient relational operations directly on big matrix data in distributed clusters.

Findings

01

Achieves up to 100x performance improvement over state-of-the-art systems.

02

Demonstrates effectiveness on real and synthetic datasets.

03

Prototypes in Apache Spark validate the approach.

Abstract

The use of large-scale machine learning methods is becoming ubiquitous in many applications ranging from business intelligence to self-driving cars. These methods require a complex computation pipeline consisting of various types of operations, e.g., relational operations for pre-processing or post-processing the dataset, and matrix operations for core model computations. Many existing systems focus on efficiently processing matrix-only operations, and assume that the inputs to the relational operators are already pre-computed and are materialized as intermediate matrices. However, the input to a relational operator may be complex in machine learning pipelines, and may involve various combinations of matrix operators. Hence, it is critical to realize scalable and efficient relational query processors that directly operate on big matrix data. This paper presents new efficient and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

purduedb/matrel
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Advanced Graph Neural Networks · Data Quality and Management