On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms
Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar, Glaser, Cristian Canton Ferrer, Tal Hassner

TL;DR
This paper emphasizes the importance of responsible datasets in AI, proposing a framework to evaluate datasets for fairness, privacy, and compliance, and analyzing over 60 datasets revealing widespread issues.
Contribution
It introduces a responsible dataset evaluation framework considering fairness, privacy, and regulations, and provides recommendations for better dataset construction and documentation.
Findings
None of the surveyed datasets fully comply with fairness, privacy, and regulatory standards.
Proposes modifications to datasheets for datasets for improved documentation.
Highlights the need for revised dataset creation methods due to legal regulations.
Abstract
Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness of AI technologies. The scientific community has focused on the development of trustworthy AI algorithms. However, machine and deep learning algorithms, popular in the AI community today, depend heavily on the data used during their development. These learning algorithms identify patterns in the data, learning the behavioral objective. Any flaws in the data have the potential to translate directly into algorithms. In this study, we discuss the importance of Responsible Machine Learning Datasets and propose a framework to evaluate the datasets through a responsible rubric. While existing work focuses on the post-hoc evaluation of algorithms for their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Privacy-Preserving Technologies in Data
MethodsNone
