Resilient Contrastive Pre-training under Non-Stationary Drift
Xiaoyu Yang, Jie Lu, En Yu, Wei Duan

TL;DR
This paper introduces RCP, a causal intervention-based contrastive pre-training method that enhances robustness and stability of learned representations in non-stationary, drifting data environments.
Contribution
It develops a causal model of concept drift effects and proposes RCP, a scalable method that mitigates bias and improves robustness in dynamic data streams.
Findings
RCP reduces bias caused by concept drift.
RCP improves stability of representations.
RCP enhances downstream task performance.
Abstract
The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift - unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification
