DBRouting: Routing End User Queries to Databases for Answerability
Priyangshu Mandal, Manasi Patwardhan, Mayur Patidar, Lovekesh Vig

TL;DR
This paper introduces the task of routing user queries to the correct distributed databases, creating datasets, and evaluating open-source LLMs and embedding methods, revealing their strengths and limitations in this new challenge.
Contribution
It defines the novel task of query routing to databases, synthesizes datasets for it, and benchmarks baseline methods using open-source LLMs and embeddings.
Findings
Open-source LLMs outperform embedding-based approaches but face token length limits.
Embedding fine-tuning improves routing accuracy, especially with domain-specific data.
Task difficulty increases with more data sources, domain similarity, lack of external knowledge, and complex queries.
Abstract
Enterprise level data is often distributed across multiple sources and identifying the correct set-of data-sources with relevant information for a knowledge request is a fundamental challenge. In this work, we define the novel task of routing an end-user query to the appropriate data-source, where the data-sources are databases. We synthesize datasets by extending existing datasets designed for NL-to-SQL semantic parsing. We create baselines on these datasets by using open-source LLMs, using both pre-trained and task specific embeddings fine-tuned using the training data. With these baselines we demonstrate that open-source LLMs perform better than embedding based approach, but suffer from token length limitations. Embedding based approaches benefit from task specific fine-tuning, more so when there is availability of data in terms of database specific questions for training. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Access Control and Trust
