Probery: A Probability-based Incomplete Query Optimization for Big Data
Jie Song, Yichuan Zhang, Yubin Bao, Ge Yu

TL;DR
Probery introduces a probability-based approach to optimize big data queries by allowing controlled uncertainty in completeness, significantly improving query performance while maintaining a specified confidence level.
Contribution
It proposes a novel probability of query completeness (PC) model and integrates it into data placement and query processing for big data systems.
Findings
Probery guarantees PC with high accuracy across various cases.
Probery achieves up to 1.8x faster query performance than existing systems.
The approach maintains query completeness confidence while significantly reducing query time.
Abstract
Nowadays, query optimization has been highly concerned in big data management, especially in NoSQL databases. Approximate queries boost query performance by loss of accuracy, for example, sampling approaches trade off query completeness for efficiency. Different from them, we propose an uncertainty of query completeness, called Probability of query Completeness (PC for short). PC refers to the possibility that query results contain all satisfied records. For example PC=0.95, it guarantees that there are no more than 5 incomplete queries among 100 ones, but not guarantees how incomplete they are. We trade off PC for query performance, and experiments show that a small loss of PC doubles query performance. The proposed Probery (PROBability-based data quERY) adopts the uncertainty of query completeness to accelerate OLTP queries. This paper illustrates the data and probability models, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Graph Theory and Algorithms
