DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows
Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, and Wentao Zhang

TL;DR
This paper introduces DataGovBench, a comprehensive benchmark for evaluating LLM agents in real-world data governance workflows, highlighting current challenges and proposing a new framework that significantly improves task performance.
Contribution
The paper presents DataGovBench, a novel benchmark with realistic tasks and metrics, and introduces DataGovAgent, a new framework that enhances LLM capabilities in data governance workflows.
Findings
DataGovAgent improves ATS from 39.7 to 54.9 on complex tasks.
Current models struggle with multi-step workflows and error correction.
DataGovAgent reduces debugging iterations by over 77.9%.
Abstract
Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs) have emerged as a promising solution for automating data governance by translating user intent into executable transformation code. However, existing benchmarks for automated data science often emphasize snippet-level coding or high-level analytics, failing to capture the unique challenge of data governance: ensuring the correctness and quality of the data itself. To bridge this gap, we introduce DataGovBench, a benchmark featuring 150 diverse tasks grounded in real-world scenarios, built on data from actual cases. DataGovBench employs a novel "reversed-objective" methodology to synthesize realistic noise and utilizes rigorous metrics to assess end-to-end pipeline reliability. Our analysis on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Ethics and Social Impacts of AI
