Query Complexity Based Optimal Processing of Raw Data

Mayank Patel; Minal Bhise

arXiv:2205.05938·cs.DB·December 22, 2022

Query Complexity Based Optimal Processing of Raw Data

Mayank Patel, Minal Bhise

PDF

Open Access

TL;DR

This paper introduces QCA, a query complexity-aware partitioning method that optimizes large dataset processing by reducing workload execution time and replication, demonstrated on SDSS data.

Contribution

It presents a novel lightweight partitioning technique that adapts to query complexity, improving efficiency over existing workload-aware methods.

Findings

01

WET reduced by 94.6% with minimal data loaded

02

Multi-node replication decreased by 5.8x

03

Workload execution time improved by up to 42.66%

Abstract

The paper aims to find an efficient way for processing large datasets having different types of workload queries with minimal replication. The work first identifies the complexity of queries best suited for the given data processing tool . The paper proposes Query Complexity Aware partitioning technique QCA with a lightweight query identification and partitioning algorithm. Different replication approaches have been studied to cover more use-cases for different application workloads. The technique is demonstrated using a scientific dataset known as Sloan Digital Sky Survey SDSS. The results show workload execution time WET reduced by 94.6% using only 6.7% of the dataset in loaded format compared to the original dataset. The QCA technique also reduced multi-node replication by 5.8x times compared to state-of-the-art workload aware WA techniques. The multi-node and multi-core execution of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Interconnection Networks and Systems · Quantum Computing Algorithms and Architecture