Resilient Contrastive Pre-training under Non-Stationary Drift

Xiaoyu Yang; Jie Lu; En Yu; Wei Duan

arXiv:2502.07620·cs.LG·November 25, 2025

Resilient Contrastive Pre-training under Non-Stationary Drift

Xiaoyu Yang, Jie Lu, En Yu, Wei Duan

PDF

Open Access

TL;DR

This paper introduces RCP, a causal intervention-based contrastive pre-training method that enhances robustness and stability of learned representations in non-stationary, drifting data environments.

Contribution

It develops a causal model of concept drift effects and proposes RCP, a scalable method that mitigates bias and improves robustness in dynamic data streams.

Findings

01

RCP reduces bias caused by concept drift.

02

RCP improves stability of representations.

03

RCP enhances downstream task performance.

Abstract

The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift - unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification