TL;DR
This paper introduces a new K-nearest neighbors based method to improve the trustworthiness of bias detection in black-box machine learning models, addressing limitations of existing explanation techniques.
Contribution
It proposes a novel bias measurement approach that overcomes shortcomings of current methods, enhancing fairness assessment in ML software.
Findings
KNN-based bias detection outperforms existing explanation methods
The approach provides more trustworthy insights into model bias
Framework for combining explanation and planning for fair ML software
Abstract
Machine learning software is being used in many applications (finance, hiring, admissions, criminal justice) having a huge social impact. But sometimes the behavior of this software is biased and it shows discrimination based on some sensitive attributes such as sex, race, etc. Prior works concentrated on finding and mitigating bias in ML models. A recent trend is using instance-based model-agnostic explanation methods such as LIME to find out bias in the model prediction. Our work concentrates on finding shortcomings of current bias measures and explanation methods. We show how our proposed method based on K nearest neighbors can overcome those shortcomings and find the underlying bias of black-box models. Our results are more trustworthy and helpful for the practitioners. Finally, We describe our future framework combining explanation and planning to build fair software.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
