Predictive Heterogeneity: Measures and Applications
Jiashuo Liu, Jiayun Wu, Bo Li, Peng Cui

TL;DR
This paper introduces the concept of usable predictive heterogeneity, providing a way to measure and leverage data heterogeneity in machine learning to improve generalization and fairness across diverse applications.
Contribution
It proposes the measurable concept of usable predictive heterogeneity, offers PAC bounds for estimation, and develops a bi-level optimization algorithm to explore heterogeneity from data.
Findings
Explores heterogeneity in income, crop yield, and image classification tasks.
Leveraging heterogeneity improves out-of-distribution generalization.
Provides theoretical bounds for estimating heterogeneity from finite data.
Abstract
As an intrinsic and fundamental property of big data, data heterogeneity exists in a variety of real-world applications, such as precision medicine, autonomous driving, financial applications, etc. For machine learning algorithms, the ignorance of data heterogeneity will greatly hurt the generalization performance and the algorithmic fairness, since the prediction mechanisms among different sub-populations are likely to differ from each other. In this work, we focus on the data heterogeneity that affects the prediction of machine learning models, and firstly propose the \emph{usable predictive heterogeneity}, which takes into account the model capacity and computational constraints. We prove that it can be reliably estimated from finite data with probably approximately correct (PAC) bounds. Additionally, we design a bi-level optimization algorithm to explore the usable predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
