AdaNDV: Adaptive Number of Distinct Value Estimation via Learning to Select and Fuse Estimators
Xianghong Xu, Tieying Zhang, Xiao He, Haoyang Li, Rong Kang, Shuai, Wang, Linhui Xu, Zhimin Liang, Shangyu Luo, Lei Zhang, Jianjun Chen

TL;DR
AdaNDV is a novel adaptive method that intelligently selects and fuses existing estimators using learned models to improve the accuracy of distinct value estimation in large-scale datasets.
Contribution
It introduces a learned approach to select and fuse estimators for NDV, addressing the challenge of estimator suitability and enhancing estimation accuracy.
Findings
Outperforms existing methods on real-world datasets
Effectively distinguishes overestimated and underestimated estimators
Achieves higher accuracy in large-scale data scenarios
Abstract
Estimating the Number of Distinct Values (NDV) is fundamental for numerous data management tasks, especially within database applications. However, most existing works primarily focus on introducing new statistical or learned estimators, while identifying the most suitable estimator for a given scenario remains largely unexplored. Therefore, we propose AdaNDV, a learned method designed to adaptively select and fuse existing estimators to address this issue. Specifically, (1) we propose to use learned models to distinguish between overestimated and underestimated estimators and then select appropriate estimators from each category. This strategy provides a complementary perspective by integrating overestimations and underestimations for error correction, thereby improving the accuracy of NDV estimation. (2) To further integrate the estimation results, we introduce a novel fusion approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications
