Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents
Ruiying Ma, Shreya Shankar, Ruiqi Chen, Yiming Lin, Sepanta Zeighami, Rajoshi Ghosh, Abhinav Gupta, Anushrut Gupta, Tanmai Gopal, Aditya G. Parameswaran

TL;DR
This paper introduces the Data Agent Benchmark (DAB), a comprehensive evaluation framework for enterprise data agents that assesses their ability to integrate, transform, and analyze data across multiple heterogeneous systems, revealing current limitations.
Contribution
The paper presents DAB, the first benchmark evaluating the full data agent pipeline across diverse industries and systems, along with analysis of model performance and failure modes.
Findings
Best model achieves only 38% accuracy on DAB
Benchmark reveals significant challenges in data integration and transformation
Provides insights for future development of reliable data agents
Abstract
Users across enterprises increasingly rely on AI agents to query their data through natural language. However, building reliable data agents remains difficult because real-world data is often fragmented across multiple heterogeneous database systems, with inconsistent references and information buried in unstructured text. Existing benchmarks only tackle individual pieces of this problem -- e.g., translating natural-language questions into SQL queries, answering questions over small tables provided in context -- but do not evaluate the full pipeline of integrating, transforming, and analyzing data across multiple database systems. To fill this gap, we present the Data Agent Benchmark (DAB), grounded in a formative study of enterprise data agent workloads across six industries. DAB comprises 54 queries across 12 datasets, 9 domains, and 4 database management systems. On DAB, the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Semantic Web and Ontologies
