Resource Utilization Monitoring for Raw Data Query Processing
Mayank Patel, Minal Bhise

TL;DR
This paper enhances raw data query processing by integrating resource monitoring, analyzing resource usage patterns, and proposing adaptive data partitioning techniques to improve efficiency and scalability in scientific data analysis.
Contribution
It introduces a resource monitoring module into raw data query frameworks and proposes QCA and RUA data partitioning techniques based on resource usage analysis.
Findings
Sampling queries have the lowest resource utilization.
PostgresRAW outperforms PostgreSQL on simple 0-JOIN queries.
Complex JOIN queries benefit from PostgreSQL to reduce workload.
Abstract
Scientific experiments, simulations, and modern applications generate large amounts of data. Data is stored in raw format to avoid the high loading time of traditional database management systems. Researchers have proposed many techniques to improve query execution time for raw data and reduce data loading time for traditional systems. The core of all the proposed techniques is efficient utilization of resources by processing only required data or reducing operations on data. The processed data caching in the main memory or disk can resolve this issue and avoid repeated processing of data. However, limitations of resources like main memory space, storage IO speeds, and additional storage space requirements on disk need to be considered to provide reliable and scalable solutions for cloud or in-house deployments. This paper presents improvements to the raw data query processing framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Scientific Computing and Data Management
