Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Yaoshu Wang; Chuan Xiao; Jianbin Qin; Rui Mao; Onizuka Makoto; Wei; Wang; Rui Zhang; Yoshiharu Ishikawa

arXiv:2005.09908·cs.DB·May 28, 2021

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Onizuka Makoto, Wei, Wang, Rui Zhang, Yoshiharu Ishikawa

PDF

1 Repo

TL;DR

This paper introduces a deep learning-based selectivity estimation model for high-dimensional data that guarantees consistency and improves accuracy by partitioning data and learning query-dependent functions.

Contribution

It presents a novel deep learning model that ensures consistent, flexible selectivity estimation for high-dimensional data, outperforming existing methods.

Findings

01

Outperforms state-of-the-art models in accuracy

02

Efficiently handles large-scale high-dimensional data

03

Useful for real-world database applications

Abstract

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection, query optimization, and data integration. The estimation problem is especially challenging for large-scale high-dimensional data due to the curse of dimensionality, the large variance of selectivity across different queries, and the need to make the estimator consistent (i.e., the selectivity is non-decreasing in the threshold). We propose a new deep learning-based model that learns a query-dependent piecewise linear function as selectivity estimator, which is flexible to fit the selectivity curve of any distance function and query object, while guaranteeing that the output is non-decreasing in the threshold. To improve the accuracy for large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaoshuwang/SelNet-Estimation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.