TL;DR
SSCard introduces a novel substring cardinality estimation method using a suffix tree-guided learned FM-Index, significantly improving accuracy and efficiency for database query optimization.
Contribution
It extends the FM-Index with a suffix tree structure and error-bounded spline interpolation, providing a space-efficient, accurate, and update-friendly cardinality estimator.
Findings
Reduces average q-error by 20%
Achieves 80% reduction in maximum q-error
Cuts construction time by 50%
Abstract
Accurate cardinality estimation of substring queries, which are commonly expressed using the SQL LIKE predicate, is crucial for query optimization in database systems. While both rule-based methods and machine learning-based methods have been developed to optimize various aspects of cardinality estimation, their absence of error bounds may result in substantial estimation errors, leading to suboptimal execution plans. In this paper, we propose SSCard, a novel SubString Cardinality estimator that leverages a space-efficient FM-Index into flexible database applications. SSCard first extends the FM-Index to support multiple strings naturally, and then organizes the FM-index using a pruned suffix tree. The suffix tree structure enables precise cardinality estimation for short patterns and achieves high compression via a pushup operation, especially on a large alphabet with skewed character…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
