Data Heterogeneity Modeling for Trustworthy Machine Learning

Jiashuo Liu; Peng Cui

arXiv:2506.00969·cs.LG·June 3, 2025

Data Heterogeneity Modeling for Trustworthy Machine Learning

Jiashuo Liu, Peng Cui

PDF

TL;DR

This paper surveys heterogeneity-aware machine learning, emphasizing the importance of modeling data diversity throughout the ML pipeline to improve robustness, fairness, and reliability across various critical fields.

Contribution

It provides a comprehensive overview of heterogeneity-aware ML, highlighting its benefits, applications, and future research directions in addressing data diversity challenges.

Findings

01

Heterogeneity-aware ML enhances model robustness and fairness.

02

Applying heterogeneity considerations improves generalization across domains.

03

The survey identifies key future research opportunities in heterogeneity modeling.

Abstract

Data heterogeneity plays a pivotal role in determining the performance of machine learning (ML) systems. Traditional algorithms, which are typically designed to optimize average performance, often overlook the intrinsic diversity within datasets. This oversight can lead to a myriad of issues, including unreliable decision-making, inadequate generalization across different domains, unfair outcomes, and false scientific inferences. Hence, a nuanced approach to modeling data heterogeneity is essential for the development of dependable, data-driven systems. In this survey paper, we present a thorough exploration of heterogeneity-aware machine learning, a paradigm that systematically integrates considerations of data heterogeneity throughout the entire ML pipeline -- from data collection and model training to model evaluation and deployment. By applying this approach to a variety of critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.