CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution
Rui Li, Kangfei Zhao, Jeffrey Xu Yu, Guoren Wang

TL;DR
CardOOD introduces a robust framework for query-driven cardinality estimation that effectively mitigates out-of-distribution issues, enhancing accuracy and reliability in query optimization tasks.
Contribution
The paper presents a novel learning framework that extends transfer and robust learning techniques, including a new self-supervised algorithm tailored for cardinality estimation, to improve OOD robustness.
Findings
Effective mitigation of OOD problems in cardinality estimation.
Integration of CardOOD into PostgreSQL improves query optimization.
New self-supervised learning algorithm models cardinality constraints.
Abstract
Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the Out-of-distribution (OOD) problem, where the test workload distribution differs from the training workload, leading to performancedegradation. In this paper, we present CardOOD, a general learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we extend classical transfer/robust learning techniques to train query-driven cardinalityestimators, and the algorithms fall into three categories: representation learning, data manipulation, and new learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Advanced Database Systems and Queries · Data Management and Algorithms
