CIAO: An Optimization Framework for Client-Assisted Data Loading
Cong Ding, Dixin Tang, Xi Liang, Aaron J. Elmore, Sanjay Krishnan

TL;DR
CIAO is a system that leverages client-side prefiltering and data skipping to significantly accelerate data loading and query execution in big data applications, optimizing performance within client resource constraints.
Contribution
Introduces CIAO, a tunable framework enabling client-server cooperation for efficient partial data loading and data skipping, with an algorithm for near-optimal predicate selection under resource budgets.
Findings
Data loading speedup up to 21x
Query execution acceleration up to 23x
End-to-end performance improved by up to 19x
Abstract
Data loading has been one of the most common performance bottlenecks for many big data applications, especially when they are running on inefficient human-readable formats, such as JSON or CSV. Parsing, validating, integrity checking and data structure maintenance are all computationally expensive steps in loading these formats. Regardless of these costs, many records may be filtered later during query evaluation due to highly selective predicates -- resulting in wasted computation. Meanwhile, the computing power of client ends is typically not exploited. Here, we explore investing limited cycles of clients on prefiltering to accelerate data loading and enable data skipping for query execution. In this paper, we present CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload. We proposed an efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Scientific Computing and Data Management
