On Learning and Enforcing Latent Assessment Models using Binary Feedback from Human Auditors Regarding Black-Box Classifiers
Mukund Telukunta, Venkata Sriram Siddhardh Nadendla

TL;DR
This paper introduces a latent assessment model (LAM) to interpret binary feedback from human auditors about black-box classifiers, establishing fairness guarantees and analyzing feedback requirements with empirical validation.
Contribution
It proposes a novel LAM framework linking human feedback to fairness notions, providing theoretical guarantees and empirical validation on real datasets.
Findings
Fairness notions are guaranteed if intrinsic judgments satisfy them.
Minimum feedback samples for PAC learning are derived.
Empirical validation confirms theoretical guarantees.
Abstract
Algorithmic fairness literature presents numerous mathematical notions and metrics, and also points to a tradeoff between them while satisficing some or all of them simultaneously. Furthermore, the contextual nature of fairness notions makes it difficult to automate bias evaluation in diverse algorithmic systems. Therefore, in this paper, we propose a novel model called latent assessment model (LAM) to characterize binary feedback provided by human auditors, by assuming that the auditor compares the classifier's output to his or her own intrinsic judgment for each input. We prove that individual and group fairness notions are guaranteed as long as the auditor's intrinsic judgments inherently satisfy the fairness notion at hand, and are relatively similar to the classifier's evaluations. We also demonstrate this relationship between LAM and traditional fairness notions on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
