Tursio Database Search: How far are we from ChatGPT?
Sulbha Jain, Shivani Tripathi, Shi Qiao, Alekh Jindal

TL;DR
This paper introduces an evaluation framework for enterprise database search using natural language, comparing a dedicated platform Tursio with ChatGPT and Perplexity, revealing comparable answer relevancy despite different response sources.
Contribution
It presents a novel end-to-end evaluation method for structured database search in enterprise contexts, addressing a gap in existing benchmarks.
Findings
Tursio achieves answer relevancy comparable to ChatGPT and Perplexity.
Database completeness is identified as the main bottleneck.
The framework enables realistic assessment of enterprise database search systems.
Abstract
Business users need to search enterprise databases using natural language, just as they now search the web using ChatGPT or Perplexity. However, existing benchmarks -- designed for open-domain QA or text-to-SQL -- do not evaluate the end-to-end quality of such a search experience. We present an evaluation framework for structured database search that generates realistic banking queries across varying difficulty levels and assesses answer quality using relevance, safety, and conversational metrics via an LLM-as-judge approach. We apply this framework to compare Tursio, a database search platform, against ChatGPT and Perplexity on a credit union banking schema. Our results show that Tursio achieves answer relevancy statistically comparable to both baselines (97.8% vs. 98.1% on simple, 90.0% vs. 100.0% on medium, 89.5% vs. 100.0% on hard questions), even though Tursio answers from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
