Data Readiness Report
Shazia Afzal, Rajmohan C, Manish Kesarwani, Sameep Mehta, Hima Patel

TL;DR
The paper proposes a Data Readiness Report as a comprehensive documentation tool to improve transparency, reproducibility, and management of data quality assessments in AI workflows.
Contribution
It introduces the concept of Data Readiness Reports, detailing their role in documenting data quality, transformations, and lineage to enhance data governance and workflow automation.
Findings
Provides detailed insights into data quality dimensions.
Documents data transformations and lineage for better governance.
Lays groundwork for automated data readiness workflows.
Abstract
Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and results in loss of productivity. We introduce the concept of a Data Readiness Report as an accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of input data. Data characteristics and challenges on various quality dimensions are identified and documented keeping in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations including applied transformations. This provides a detailed lineage for the purpose of data governance and management. In effect, the report captures and documents the actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
