A hierarchical decomposition for explaining ML performance discrepancies

Jean Feng; Harvineet Singh; Fan Xia; Adarsh Subbaswamy; Alexej; Gossmann

arXiv:2402.14254·cs.LG·February 23, 2024·1 cites

A hierarchical decomposition for explaining ML performance discrepancies

Jean Feng, Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej, Gossmann

PDF

Open Access 1 Video

TL;DR

This paper introduces a nonparametric hierarchical framework that provides detailed explanations for why machine learning performance varies across domains, enabling targeted interventions without needing causal knowledge.

Contribution

It develops a novel hierarchical decomposition method that offers both aggregate and variable-level insights into performance gaps, without relying on causal assumptions.

Findings

01

Provides debiased, efficient estimators for the decompositions

02

Enables statistical inference with valid confidence intervals

03

Improves understanding of domain-specific performance differences

Abstract

Machine learning (ML) algorithms can often differ in performance across domains. Understanding $why$ their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on $aggregate decompositions$ of the total performance gap into the impact of a shift in the distribution of features $p (X)$ versus the impact of a shift in the conditional distribution of the outcome $p (Y ∣ X)$ ; however, such coarse explanations offer only a few options for how one can close the performance gap. $Detailed variable-level decompositions$ that quantify the importance of each variable to each term in the aggregate decomposition can provide a much deeper understanding and suggest much more targeted interventions. However, existing methods assume knowledge of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A hierarchical decomposition for explaining ML performance discrepancies· slideslive

Taxonomy

TopicsMachine Learning and Data Classification

MethodsFocus