Learning With Multi-Group Guarantees For Clusterable Subpopulations

Jessica Dai; Nika Haghtalab; Eric Zhao

arXiv:2410.14588·cs.LG·December 10, 2024

Learning With Multi-Group Guarantees For Clusterable Subpopulations

Jessica Dai, Nika Haghtalab, Eric Zhao

PDF

Open Access

TL;DR

This paper introduces a method for providing performance guarantees for subpopulations defined by natural clusters within the data, using multi-objective algorithms that handle uncertain subpopulation structures effectively.

Contribution

The work proposes formalizations for subpopulation guarantees based on clustering likelihoods and develops an online calibration algorithm with improved rates over traditional methods.

Findings

01

Multi-objective algorithm achieves an $O(T^{1/2})$ rate for subpopulation guarantees.

02

Cluster-then-predict approach has a slower $O(T^{2/3})$ rate and requires separability.

03

Providing guarantees for clusters can be easier than learning the clusters themselves.

Abstract

A canonical desideratum for prediction problems is that performance guarantees should hold not just on average over the population, but also for meaningful subpopulations within the overall population. But what constitutes a meaningful subpopulation? In this work, we take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals for which predictions are being made. In this view, a population refers to a mixture model whose components constitute the relevant subpopulations. We suggest two formalisms for capturing per-subgroup guarantees: first, by attributing each individual to the component from which they were most likely drawn, given their features; and second, by attributing each individual to all components in proportion to their relative likelihood of having been drawn from each component.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Text and Document Classification Technologies