Can Large Language Models be a Cardinality Estimator? An Empirical study
Liangzu Liu, Yiyan Wang, Yinjun Wu, Runze Su, Zhuo Chang, Peizhi Wu, Jianjun Chen, Fuxin Jiang, Rui Shi, Bin Cui, Tieying Zhang

TL;DR
This paper explores using Large Language Models for cardinality estimation in databases, demonstrating their superior accuracy and generalizability over traditional methods, with manageable inference overhead.
Contribution
It introduces a novel approach leveraging LLMs with prompt crafting, fine-tuning, and self-correction for improved cardinality estimation in DBMS.
Findings
LLMs outperform state-of-the-art methods in most settings.
LLMs show strong generalizability to unseen data and complex queries.
Inference overhead can be justified by benefits in estimation accuracy.
Abstract
Cardinality estimation (CardEst) still remains a challenging problem for DBMS. Recent years have witnessed the success of ML-based cardinality estimators in outperforming traditional methods. However, these solutions suffer from poor generalizability to new data or query distribution, inability to handle complex queries, and substantial data preparation overhead, thus preventing their wide adoption in the real-world DBMS. Some recent efforts have been dedicated to addressing some but not all of these issues. We notice that the recent emerging Large Language Models (LLMs) have shown their remarkable generalizability to unseen tasks, capabilities to understand complex programs, and power to perform data-efficient fine-tuning. In light of this, we propose to leverage LLMs to mitigate the above issues. Specifically, we carefully craft prompts, and subsequently perform fine-tuning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
