LLM-Enhanced Data Management
Xuanhe Zhou, Xinyang Zhao, Guoliang Li

TL;DR
This paper introduces LLMDB, a novel LLM-enhanced data management framework that improves accuracy, reduces costs, and avoids hallucination by embedding domain knowledge, using vector databases, and deploying multi-round inference, demonstrated in real-world scenarios.
Contribution
The paper presents LLMDB, a new paradigm integrating LLMs with domain knowledge, vector databases, and multi-round inference to enhance data management tasks.
Findings
Effective in query rewrite, database diagnosis, and data analytics.
Reduces LLM costs via vector database caching.
Achieves high accuracy and avoids hallucination.
Abstract
Machine learning (ML) techniques for optimizing data management problems have been extensively studied and widely deployed in recent five years. However traditional ML methods have limitations on generalizability (adapting to different scenarios) and inference ability (understanding the context). Fortunately, large language models (LLMs) have shown high generalizability and human-competitive abilities in understanding context, which are promising for data management tasks (e.g., database diagnosis, database tuning). However, existing LLMs have several limitations: hallucination, high cost, and low accuracy for complicated tasks. To address these challenges, we design LLMDB, an LLM-enhanced data management paradigm which has generalizability and high inference ability while avoiding hallucination, reducing LLM cost, and achieving high accuracy. LLMDB embeds domain-specific knowledge to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Simulation Techniques and Applications
