"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

Harvineet Singh; Fan Xia; Alexej Gossmann; Andrew Chuang; Julian C. Hong; Jean Feng

arXiv:2506.00756·cs.LG·June 3, 2025

"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C. Hong, Jean Feng

PDF

Open Access 1 Video

TL;DR

This paper introduces SHIFT, a hierarchical framework that diagnoses and explains performance decay in machine learning models across subgroups, enabling targeted mitigation strategies.

Contribution

The paper presents a novel hierarchical inference framework, SHIFT, that identifies and explains subgroup-specific performance decay in ML models, improving upon existing methods.

Findings

01

SHIFT identifies interpretable subgroups with large performance decay

02

SHIFT suggests targeted mitigation actions effectively reducing decay

03

The framework provides detailed insights into variable-specific shifts

Abstract

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift· slideslive

Taxonomy

TopicsIntegrated Circuits and Semiconductor Failure Analysis · Semiconductor materials and devices