Optimizing I/O for Big Array Analytics
Yi Zhang, Jun Yang

TL;DR
This paper presents a framework for optimizing I/O in big array analytics by capturing analysis tasks, representing them declaratively, and exploiting sharing opportunities to reduce data movement and improve efficiency.
Contribution
It introduces a novel declarative framework and optimization techniques specifically designed for I/O sharing in big array analytics tasks.
Findings
Optimizer finds plans with significant I/O savings
Exploits nontrivial sharing opportunities
Improves efficiency of big array analytics
Abstract
Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating intermediate results. In the big data setting, I/O optimization is a key to efficient analytics. In this paper, we develop a framework and techniques for capturing a broad range of analysis tasks expressible in nested-loop forms, representing them in a declarative way, and optimizing their I/O by identifying sharing opportunities. Experiment results show that our optimizer is capable of finding execution plans that exploit nontrivial I/O sharing opportunities with significant savings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Advanced Database Systems and Queries · Scientific Computing and Data Management
