Data Readiness for AI: A 360-Degree Survey

Kaveen Hiniduma; Suren Byna; Jean Luca Bez

arXiv:2404.05779·cs.LG·March 10, 2025·3 cites

Data Readiness for AI: A 360-Degree Survey

Kaveen Hiniduma, Suren Byna, Jean Luca Bez

PDF

Open Access

TL;DR

This paper surveys existing metrics for evaluating data readiness in AI, proposing a taxonomy to standardize assessment and improve data quality for AI training and inference.

Contribution

It provides a comprehensive taxonomy of data readiness metrics for structured and unstructured datasets, aiming to establish new standards in the field.

Findings

01

Reviewed over 140 papers and sources on data readiness metrics.

02

Proposed a taxonomy to categorize and standardize data readiness metrics.

03

Aims to influence future standards for AI data quality assessment.

Abstract

Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Big Data and Business Intelligence

MethodsLib