Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation
Yihang Zheng, Bo Li, Zhenghao Lin, Yi Luo, Xuanhe Zhou, Chen Lin,, Jinsong Su, Guoliang Li, Shifu Li

TL;DR
This paper introduces DQABench, a comprehensive benchmark and testbed for evaluating large language models in database question answering, including dataset generation, modular evaluation components, and analysis of model capabilities.
Contribution
The paper presents the first extensive benchmark and modular testbed for LLM-based database QA, with over 200,000 QA pairs and evaluation of various model components.
Findings
Identified strengths and limitations of nine LLM-based QA bots.
Assessed the impact of different service components on performance.
Provided insights for future LLM-based database QA development.
Abstract
The development of Large Language Models (LLMs) has revolutionized QA across various industries, including the database domain. However, there is still a lack of a comprehensive benchmark to evaluate the capabilities of different LLMs and their modular components in database QA. To this end, we introduce DQABench, the first comprehensive database QA benchmark for LLMs. DQABench features an innovative LLM-based method to automate the generation, cleaning, and rewriting of evaluation dataset, resulting in over 200,000 QA pairs in English and Chinese, separately. These QA pairs cover a wide range of database-related knowledge extracted from manuals, online communities, and database instances. This inclusion allows for an additional assessment of LLMs' Retrieval-Augmented Generation (RAG) and Tool Invocation Generation (TIG) capabilities in the database QA task. Furthermore, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Information Retrieval and Search Behavior
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · travel james · Byte Pair Encoding · Softmax · Layer Normalization · Dropout · Attention Is All You Need · WordPiece · Residual Connection · Attention Dropout
