Can Large Language Models be a Cardinality Estimator? An Empirical study

Liangzu Liu; Yiyan Wang; Yinjun Wu; Runze Su; Zhuo Chang; Peizhi Wu; Jianjun Chen; Fuxin Jiang; Rui Shi; Bin Cui; Tieying Zhang

arXiv:2603.28080·cs.DB·March 31, 2026

Can Large Language Models be a Cardinality Estimator? An Empirical study

Liangzu Liu, Yiyan Wang, Yinjun Wu, Runze Su, Zhuo Chang, Peizhi Wu, Jianjun Chen, Fuxin Jiang, Rui Shi, Bin Cui, Tieying Zhang

PDF

TL;DR

This paper explores using Large Language Models for cardinality estimation in databases, demonstrating their superior accuracy and generalizability over traditional methods, with manageable inference overhead.

Contribution

It introduces a novel approach leveraging LLMs with prompt crafting, fine-tuning, and self-correction for improved cardinality estimation in DBMS.

Findings

01

LLMs outperform state-of-the-art methods in most settings.

02

LLMs show strong generalizability to unseen data and complex queries.

03

Inference overhead can be justified by benefits in estimation accuracy.

Abstract

Cardinality estimation (CardEst) still remains a challenging problem for DBMS. Recent years have witnessed the success of ML-based cardinality estimators in outperforming traditional methods. However, these solutions suffer from poor generalizability to new data or query distribution, inability to handle complex queries, and substantial data preparation overhead, thus preventing their wide adoption in the real-world DBMS. Some recent efforts have been dedicated to addressing some but not all of these issues. We notice that the recent emerging Large Language Models (LLMs) have shown their remarkable generalizability to unseen tasks, capabilities to understand complex programs, and power to perform data-efficient fine-tuning. In light of this, we propose to leverage LLMs to mitigate the above issues. Specifically, we carefully craft prompts, and subsequently perform fine-tuning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.