Algorithmic Complexity Attacks on All Learned Cardinality Estimators: A Data-centric Approach

Yingze Li; Xianglong Liu; Dong Wang; Zixuan Wang; Hongzhi Wang; Kaixing Zhang; Yiming Guan

arXiv:2507.07438·cs.DB·July 11, 2025

Algorithmic Complexity Attacks on All Learned Cardinality Estimators: A Data-centric Approach

Yingze Li, Xianglong Liu, Dong Wang, Zixuan Wang, Hongzhi Wang, Kaixing Zhang, Yiming Guan

PDF

Open Access

TL;DR

This paper investigates the vulnerability of learned cardinality estimators to data-centric attacks, demonstrating their fragility under minimal data drifts and proposing methods to analyze and mitigate these risks.

Contribution

It provides the first theoretical analysis of data-driven attacks on learned estimators, introduces an approximation algorithm for optimal attacks, and suggests countermeasures for robustness.

Findings

01

Minimal data modifications can drastically degrade estimator accuracy

02

The optimal attack strategy is NP-Hard to compute

03

Proposed approximation algorithm effectively finds near-optimal attacks

Abstract

Learned cardinality estimators show promise in query cardinality prediction, yet they universally exhibit fragility to training data drifts, posing risks for real-world deployment. This work is the first to theoretical investigate how minimal data-level drifts can maximally degrade the accuracy of learned estimators. We propose data-centric algorithmic complexity attacks against learned estimators in a black-box setting, proving that finding the optimal attack strategy is NP-Hard. To address this, we design a polynomial-time approximation algorithm with a $(1 - κ)$ approximation ratio. Extensive experiments demonstrate our attack's effectiveness: on STATS-CEB and IMDB-JOB benchmarks, modifying just 0.8\% of training tuples increases the 90th percentile Qerror by three orders of magnitude and raises end-to-end processing time by up to 20 $\times$ . Our work not only reveals critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques