DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows

Zhou Liu; Zhaoyang Han; Guochen Yan; Hao Liang; Bohan Zeng; Xing Chen; Yuanfeng Song; and Wentao Zhang

arXiv:2512.04416·cs.AI·December 9, 2025

DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows

Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, and Wentao Zhang

PDF

Open Access

TL;DR

This paper introduces DataGovBench, a comprehensive benchmark for evaluating LLM agents in real-world data governance workflows, highlighting current challenges and proposing a new framework that significantly improves task performance.

Contribution

The paper presents DataGovBench, a novel benchmark with realistic tasks and metrics, and introduces DataGovAgent, a new framework that enhances LLM capabilities in data governance workflows.

Findings

01

DataGovAgent improves ATS from 39.7 to 54.9 on complex tasks.

02

Current models struggle with multi-step workflows and error correction.

03

DataGovAgent reduces debugging iterations by over 77.9%.

Abstract

Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs) have emerged as a promising solution for automating data governance by translating user intent into executable transformation code. However, existing benchmarks for automated data science often emphasize snippet-level coding or high-level analytics, failing to capture the unique challenge of data governance: ensuring the correctness and quality of the data itself. To bridge this gap, we introduce DataGovBench, a benchmark featuring 150 diverse tasks grounded in real-world scenarios, built on data from actual cases. DataGovBench employs a novel "reversed-objective" methodology to synthesize realistic noise and utilizes rigorous metrics to assess end-to-end pipeline reliability. Our analysis on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Data Quality and Management · Ethics and Social Impacts of AI