Downsizing Diffusion Models for Cardinality Estimation
Xinhe Mu, Zhaoqi Zhou, Zaijiu Shang, Chuan Zhou, Gang Fu, Guiying Yan, Guoliang Li, Zhiming Ma

TL;DR
This paper introduces ADC, a lightweight diffusion model framework that significantly improves cardinality estimation accuracy and efficiency in databases, especially for complex dependencies, outperforming existing methods.
Contribution
The paper presents ADC, the first downsized diffusion model for high-precision cardinality estimation, combining a hybrid architecture with theoretical insights to reduce latency and improve robustness.
Findings
ADC is 10 times more accurate than Naru on datasets with complex dependencies.
ADC achieves half the latency of Naru while maintaining similar accuracy.
ADC requires less than 350KB storage on most datasets.
Abstract
Learned cardinality estimation requires accurate model designs to capture the local characteristics of probability distributions. However, existing models may fail to accurately capture complex, multilateral dependencies between attributes. Diffusion models, meanwhile, can succeed in estimating image distributions with thousands of dimensions, making them promising candidates, but their heavy weight and high latency prohibit effective implementation. We seek to make diffusion models more lightweight by introducing Accelerated Diffusion Cardest (ADC), the first "downsized" diffusion model framework for efficient, high-precision cardinality estimation. ADC utilizes a hybrid architecture that integrates a Gaussian Mixture-Bayesnet selectivity estimator with a score-based density estimator to perform precise Monte Carlo integration. Addressing the issue of prohibitive inference latencies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models
