Detecting discriminatory risk through data annotation based on Bayesian inferences
Elena Beretta, Antonio Vetr\`o, Bruno Lepri, Juan Carlos De Martin

TL;DR
This paper introduces a Bayesian inference-based data annotation method to identify and warn about potential racial discrimination risks in datasets used for machine learning, emphasizing the importance of sampling practices.
Contribution
The paper presents a novel Bayesian inference approach for data annotation that highlights sampling biases and discrimination risks, addressing a gap in ethical data collection practices.
Findings
Effective in detecting racial discrimination risks in datasets
Provides insights into sampling biases affecting model fairness
Applicable to multiple datasets for bias assessment
Abstract
Thanks to the increasing growth of computational power and data availability, the research in machine learning has advanced with tremendous rapidity. Nowadays, the majority of automatic decision making systems are based on data. However, it is well known that machine learning systems can present problematic results if they are built on partial or incomplete data. In fact, in recent years several studies have found a convergence of issues related to the ethics and transparency of these systems in the process of data collection and how they are recorded. Although the process of rigorous data collection and analysis is fundamental in the model design, this step is still largely overlooked by the machine learning community. For this reason, we propose a method of data annotation based on Bayesian statistical inference that aims to warn about the risk of discriminatory results of a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
