Metadata Management for AI-Augmented Data Workflows
Jinjin Zhao, Sanjay Krishnan

TL;DR
This paper introduces TableVault, a metadata governance framework that enhances transparency, reproducibility, and lineage tracking in complex AI-augmented data workflows involving human and model-driven processes.
Contribution
It presents a novel metadata management system tailored for human-AI collaborative workflows, combining database guarantees with AI-specific features for comprehensive metadata capture.
Findings
TableVault effectively preserves data lineage and operational context.
It supports transparency and reproducibility in complex workflows.
Demonstrated through a document classification case study.
Abstract
AI-augmented data workflows introduce complex governance challenges, as both human and model-driven processes generate, transform, and consume data artifacts. These workflows blend heterogeneous tools, dynamic execution patterns, and opaque model decisions, making comprehensive metadata capture difficult. In this work, we present TableVault, a metadata governance framework designed for human-AI collaborative data creation. TableVault records ingestion events, traces operation status, links execution parameters to their data origins, and exposes a standardized metadata layer. By combining database-inspired guarantees with AI-oriented design, such as declarative operation builders and lineage-aware references, TableVault supports transparency and reproducibility across mixed human-model pipelines. Through a document classification case study, we demonstrate how TableVault preserves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Business Process Modeling and Analysis
