Fair and Diverse DPP-based Data Summarization
L. Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun, Kathuria, Nisheeth K. Vishnoi

TL;DR
This paper introduces a framework for fair and diverse data summarization using DPPs, addressing bias issues by incorporating fairness constraints and providing efficient sampling algorithms with theoretical guarantees.
Contribution
It develops a novel method to integrate fairness constraints into DPP-based data summarization and proposes a fast, provably effective sampling algorithm under certain conditions.
Findings
Fairness constraints do not significantly reduce diversity in samples.
The proposed sampler is efficient and effective for well-conditioned input vectors.
Experimental results validate the theoretical guarantees and practical utility.
Abstract
Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias (under- or over-representation of a certain gender or race) in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Coming up with efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier and we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our experimental results on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare
