Query Complexity Based Optimal Processing of Raw Data
Mayank Patel, Minal Bhise

TL;DR
This paper introduces QCA, a query complexity-aware partitioning method that optimizes large dataset processing by reducing workload execution time and replication, demonstrated on SDSS data.
Contribution
It presents a novel lightweight partitioning technique that adapts to query complexity, improving efficiency over existing workload-aware methods.
Findings
WET reduced by 94.6% with minimal data loaded
Multi-node replication decreased by 5.8x
Workload execution time improved by up to 42.66%
Abstract
The paper aims to find an efficient way for processing large datasets having different types of workload queries with minimal replication. The work first identifies the complexity of queries best suited for the given data processing tool . The paper proposes Query Complexity Aware partitioning technique QCA with a lightweight query identification and partitioning algorithm. Different replication approaches have been studied to cover more use-cases for different application workloads. The technique is demonstrated using a scientific dataset known as Sloan Digital Sky Survey SDSS. The results show workload execution time WET reduced by 94.6% using only 6.7% of the dataset in loaded format compared to the original dataset. The QCA technique also reduced multi-node replication by 5.8x times compared to state-of-the-art workload aware WA techniques. The multi-node and multi-core execution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Interconnection Networks and Systems · Quantum Computing Algorithms and Architecture
