When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

Ren Fujiwara; Yasuko Matsubara; Yasushi Sakurai

arXiv:2603.09024·cs.LG·May 21, 2026

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

PDF

1 Video 3 Reviews

TL;DR

CALIPER is a novel, data-only test that determines the sufficient post-drift data size for stable retraining in streaming learning, improving adaptation after concept drift.

Contribution

It introduces CALIPER, a detector- and model-agnostic method that estimates the necessary data size for effective retraining after drift, with theoretical guarantees and broad applicability.

Findings

01

CALIPER consistently matches or exceeds fixed data size methods across multiple datasets.

02

It has low per-update time and memory overhead.

03

CALIPER often outperforms incremental updates in drift scenarios.

Abstract

Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER - a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $θ$ . When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 1

Strengths

1. The idea of "when to retrain after drift" is interesting, the previous works usually retrain directly when drift is detected. This work focuses on identifying the right time to retrain the model to help enhance the learning stability. 2. The proposed method is well designed with a detailed theoretical analysis, and the experiment is sufficient and reflects the learning performance of the proposed method.

Weaknesses

1. In the experiment, models like MLP and transformer have been chosen for comparison. I think tree-based models should also be chosen for comparison, since they are commonly used in traditional concept drift learning. 2. The author only compares the proposed method with ADWIN, which is a traditional drift detection method, more comparisons with recently proposed drift detection methods are required. 3. A parameter analysis is needed to show the robustness of the proposed method.

Reviewer 02Rating 2Confidence 3

Strengths

Strengths - Clear contribution with potentially high impact, well-grounded in the literature - Formal analysis of the proposed algorithm - Overall, the paper is well structured. Mostly easy to read, with some exceptions (see "Weaknesses" for suggestions on how to improve it)

Weaknesses

The state dependence looks like a pretty strong assumption, as it assumes continuity of the state transition function. I am not sure if this is safe to assume. In particular, in high-dimensional settings. This is also reflected in Section 2.3. I miss a critical discussion on those assumptions, apart from Appendix C. Maybe one could also run experiments on datasets that differ in their dimensionality to get a better feeling for how the method performs in such cases. Problem 1: I do not understan

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper formalizes the problem of "post-drift data sufficiency," skillfully identifying the gap between drift detection and effective model adaptation. Focusing on when to retrain, rather than just if a drift occurred, is a profound and highly practical contribution to the streaming learning community. 2. The core idea of leveraging state dependence—an intrinsic data property—to infer learnability is interesting. It reframes a complex model-dependent question into a simple data-driven test

Weaknesses

1. This method hinges on the assumption that the data stream is generated by a dynamical system of the form x(t+1) = f(x(t)) + noise, This assumption may not hold in many complex dynamical systems where the next state x(t+1) depends on an extended history of past states (x(t-k), ..., x(t)) [1, 2] or is affected by significant external latent factors. The paper does not provide an analysis of its robustness when this core assumption is violated, thus restricting the method's general applicability

Videos

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Domain Adaptation and Few-Shot Learning · Time Series Analysis and Forecasting