Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data
Nabeel Seedat, Jonathan Crabb\'e, Ioana Bica, Mihaela van der Schaar

TL;DR
Data-IQ is a framework that stratifies tabular data into subgroups based on outcome heterogeneity, using model behavior and uncertainty, to improve understanding and reliability of predictions especially in sensitive domains like healthcare.
Contribution
The paper introduces Data-IQ, a novel method for characterizing data subgroups in tabular datasets by analyzing training behavior and aleatoric uncertainty, applicable across various models.
Findings
Data-IQ effectively stratifies data into Easy, Ambiguous, and Hard subgroups.
It demonstrates robustness across different models and datasets.
Subgroups inform feature acquisition, dataset selection, and model reliability.
Abstract
High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity - this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging. To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We experimentally demonstrate the benefits of Data-IQ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
