"Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift
Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C. Hong, Jean Feng

TL;DR
This paper introduces SHIFT, a hierarchical framework that diagnoses and explains performance decay in machine learning models across subgroups, enabling targeted mitigation strategies.
Contribution
The paper presents a novel hierarchical inference framework, SHIFT, that identifies and explains subgroup-specific performance decay in ML models, improving upon existing methods.
Findings
SHIFT identifies interpretable subgroups with large performance decay
SHIFT suggests targeted mitigation actions effectively reducing decay
The framework provides detailed insights into variable-specific shifts
Abstract
Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks "Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?" (Where?) and, if so, dives deeper to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntegrated Circuits and Semiconductor Failure Analysis · Semiconductor materials and devices
