Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection
Oana Inel, Tim Draws, Lora Aroyo

TL;DR
This paper introduces a systematic methodology with metrics to evaluate and improve the reliability and fairness of AI datasets throughout their lifecycle, ensuring responsible data collection practices.
Contribution
It proposes a comprehensive RAI methodology with granular metrics for assessing data quality, reliability, and fairness over time, validated across multiple datasets and modalities.
Findings
Validated approach on nine datasets and four content modalities.
Provides systematic metrics for internal reliability and external stability.
Enhances transparency and accountability in AI data collection.
Abstract
The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
