Auditing for Diversity using Representative Examples

Vijay Keswani; L. Elisa Celis

arXiv:2107.07393·cs.CY·July 16, 2021

Auditing for Diversity using Representative Examples

Vijay Keswani, L. Elisa Celis

PDF

Open Access 1 Repo

TL;DR

This paper introduces a cost-effective method for estimating dataset diversity related to protected attributes by leveraging a small labeled control set and similarity measures, reducing the need for extensive labeling.

Contribution

It proposes a novel algorithm that uses a small labeled control set and similarity metrics to approximate dataset disparity, with theoretical guarantees and adaptive control set construction.

Findings

01

Effective approximation of dataset disparity with small control sets

02

Adaptive control sets outperform random selection in reducing approximation error

03

Demonstrated success on image and Twitter datasets

Abstract

Assessing the diversity of a dataset of information associated with people is crucial before using such data for downstream applications. For a given dataset, this often involves computing the imbalance or disparity in the empirical marginal distribution of a protected attribute (e.g. gender, dialect, etc.). However, real-world datasets, such as images from Google Search or collections of Twitter posts, often do not have protected attributes labeled. Consequently, to derive disparity measures for such datasets, the elements need to hand-labeled or crowd-annotated, which are expensive processes. We propose a cost-effective approach to approximate the disparity of a given unlabeled dataset, with respect to a protected attribute, using a control set of labeled representative examples. Our proposed algorithm uses the pairwise similarity between elements in the dataset and elements in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vijaykeswani/Diversity-Audit-Using-Representative-Examples
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Machine Learning and Algorithms