Test for non-negligible adverse shifts

Vathy M. Kamulete

arXiv:2107.02990·stat.ML·August 10, 2022

Test for non-negligible adverse shifts

Vathy M. Kamulete

PDF

Open Access 1 Repo

TL;DR

This paper introduces D-SOS, a robust framework for detecting adverse dataset shifts by comparing outlier contamination rates, improving model monitoring and data validation over traditional distribution tests.

Contribution

The paper proposes D-SOS, a novel outlier score-based method for detecting adverse dataset shifts, addressing limitations of existing statistical tests.

Findings

01

D-SOS effectively detects adverse shifts in various datasets.

02

It provides a flexible way to define what constitutes 'worse' in data shifts.

03

The method is practical for real-world model monitoring and validation.

Abstract

Statistical tests for dataset shift are susceptible to false alarms: they are sensitive to minor differences when there is in fact adequate sample coverage and predictive performance. We propose instead a framework to detect adverse dataset shifts based on outlier scores, $D-SOS$ for short. $D-SOS$ holds that the new (test) sample is not substantively worse than the reference (training) sample, and not that the two are equal. The key idea is to reduce observations to outlier scores and compare contamination rates at varying weighted thresholds. Users can define what $worse$ means in terms of relevant notions of outlyingness, including proxies for predictive performance. Compared to tests of equal distribution, our approach is uniquely tailored to serve as a robust metric for model monitoring and data validation. We show how versatile and practical $D-SOS$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vathymut/dsos
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Data Stream Mining Techniques