Distributed Caching for Complex Querying of Raw Arrays
Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Anna Y. Q. Ho, and, Peter Nugent

TL;DR
This paper presents a distributed caching framework for multi-dimensional raw arrays that optimizes data placement and reduces query response times in large-scale array databases, outperforming existing methods significantly.
Contribution
It introduces a cost-based distributed caching approach with a two-stage plan for array data, including cell selection and placement, tailored for raw multi-dimensional arrays.
Findings
Achieves up to 100x reduction in workload execution time.
Outperforms existing techniques in cache overhead and efficiency.
Validated on real datasets with diverse file formats.
Abstract
As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Advanced Data Storage Technologies
