Sample Complexity of Bias Detection with Subsampled Point-to-Subspace   Distances

German Martinez Matilla; Jakub Marecek

arXiv:2502.02623·cs.LG·February 6, 2025

Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances

German Martinez Matilla, Jakub Marecek

PDF

Open Access

TL;DR

This paper investigates the sample complexity of bias detection using subsampled point-to-subspace distances, proposing an efficient approach with PAC guarantees that addresses the exponential growth of subgroups in bias testing.

Contribution

It reformulates bias detection as a point-to-subspace problem and demonstrates that, for supremum norm, it can be efficiently subsampled with probabilistic guarantees.

Findings

01

Efficient subsampling method for bias detection with PAC guarantees.

02

Applicable to bias detection across exponentially many subgroups.

03

Validated on well-known instances with positive results.

Abstract

Sample complexity of bias estimation is a lower bound on the runtime of any bias detection method. Many regulatory frameworks require the bias to be tested for all subgroups, whose number grows exponentially with the number of protected attributes. Unless one wishes to run a bias detection with a doubly-exponential run-time, one should like to have polynomial complexity of bias detection for a single subgroup. At the same time, the reference data may be based on surveys, and thus come with non-trivial uncertainty. Here, we reformulate bias detection as a point-to-subspace problem on the space of measures and show that, for supremum norm, it can be subsampled efficiently. In particular, our probabilistically approximately correct (PAC) results are corroborated by tests on well-known instances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Fault Detection and Control Systems