A unified framework for dataset shift diagnostics

Felipe Maia Polo; Rafael Izbicki; Evanildo Gomes Lacerda Jr; Juan; Pablo Ibieta-Jimenez; Renato Vicente

arXiv:2205.08340·stat.ML·September 14, 2023·1 cites

A unified framework for dataset shift diagnostics

Felipe Maia Polo, Rafael Izbicki, Evanildo Gomes Lacerda Jr, Juan, Pablo Ibieta-Jimenez, Renato Vicente

PDF

Open Access 2 Repos

TL;DR

DetectShift is a flexible framework that quantifies and tests for various dataset shifts across different data types, aiding practitioners in model adaptation especially when target labels are scarce.

Contribution

Introduces DetectShift, a novel framework for diagnosing multiple types of dataset shifts in diverse data modalities, enhancing model robustness.

Findings

01

Effectively detects dataset shifts in high-dimensional data

02

Quantifies magnitude of shifts for better interpretability

03

Applicable to regression, classification, and various data forms

Abstract

Supervised learning techniques typically assume training data originates from the target population. Yet, in reality, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors. In this work, we propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts, encompassing shifts in the distributions of $(X, Y)$ , $X$ , $Y$ , $X ∣ Y$ , and $Y ∣ X$ . DetectShift equips practitioners with insights into data shifts, facilitating the adaptation or retraining of predictors using both source and target data. This proves extremely valuable when labeled samples in the target domain are limited. The framework utilizes test statistics with the same nature to quantify the magnitude of the various shifts, making results more interpretable. It is versatile, suitable for regression and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Data Stream Mining Techniques