Revolutionizing Database Q&A with Large Language Models: Comprehensive   Benchmark and Evaluation

Yihang Zheng; Bo Li; Zhenghao Lin; Yi Luo; Xuanhe Zhou; Chen Lin,; Jinsong Su; Guoliang Li; Shifu Li

arXiv:2409.04475·cs.DB·December 9, 2024

Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation

Yihang Zheng, Bo Li, Zhenghao Lin, Yi Luo, Xuanhe Zhou, Chen Lin,, Jinsong Su, Guoliang Li, Shifu Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DQABench, a comprehensive benchmark and testbed for evaluating large language models in database question answering, including dataset generation, modular evaluation components, and analysis of model capabilities.

Contribution

The paper presents the first extensive benchmark and modular testbed for LLM-based database QA, with over 200,000 QA pairs and evaluation of various model components.

Findings

01

Identified strengths and limitations of nine LLM-based QA bots.

02

Assessed the impact of different service components on performance.

03

Provided insights for future LLM-based database QA development.

Abstract

The development of Large Language Models (LLMs) has revolutionized QA across various industries, including the database domain. However, there is still a lack of a comprehensive benchmark to evaluate the capabilities of different LLMs and their modular components in database QA. To this end, we introduce DQABench, the first comprehensive database QA benchmark for LLMs. DQABench features an innovative LLM-based method to automate the generation, cleaning, and rewriting of evaluation dataset, resulting in over 200,000 QA pairs in English and Chinese, separately. These QA pairs cover a wide range of database-related knowledge extracted from manuals, online communities, and database instances. This inclusion allows for an additional assessment of LLMs' Retrieval-Augmented Generation (RAG) and Tool Invocation Generation (TIG) capabilities in the database QA task. Furthermore, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xmudm/dqabench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Information Retrieval and Search Behavior

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · travel james · Byte Pair Encoding · Softmax · Layer Normalization · Dropout · Attention Is All You Need · WordPiece · Residual Connection · Attention Dropout