CODA: Temporal Domain Generalization via Concept Drift Simulator
Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao, Jiang, Na Zou

TL;DR
CODA introduces a data-centric, model-agnostic framework that simulates future data using feature correlations to improve temporal domain generalization amidst concept drift in real-world datasets.
Contribution
The paper proposes CODA, a novel framework that uses feature correlation matrices to simulate future data, enabling model-agnostic temporal domain generalization.
Findings
CODA effectively simulates future data with feature correlations.
Models trained with CODA data generalize better across time.
CODA outperforms existing methods in temporal domain adaptation.
Abstract
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data,…
Peer Reviews
Decision·Submitted to ICLR 2024
1. Overall, the paper is well presented and easy to follow (though some points are still not clear, please see my comments below). 2. This work studies a challenging but still under-studied problem in the literature. 3. The proposed method demonstrates superior performance over several state-of-the-art methods across multiple datasets.
1. I have concerns about the motivation of this paper. In particular, the authors have emphasized that existing TDG methods are model-centric, which are *unnecessarily comprehensive*, and therefore, TDG should be addressed via a data-centric approach. I doubt this point, as generating samples, in principle, is more challenging than discriminating them. With that said, I am not against the approach itself, but the paper presents it in a way that the data-centric itself is superior to model-centri
1. The motivation is quite interesting, and it's meaningful to decompose the concept drift into the data component and model component. 2. The proposed generative method is sound, and the theoretical analysis is also valid. 3. The experiments are well aligned with the three raised research questions.
The overall paper suggests a novel way to generate out-of-domain temporal data via generative methods. Even though the motivation is great, the major claim of the paper is to solve the temporal domain generalization, and I am not sure how generating new temporal data can help solve the domain generalization. The provided solution still goes back to train a model to get familiar with the data, and leveraging the generated data to fine-tune existing model-centric methods might have a better result
S1. Great approach in combining feature correlation prediction with data generation. S2. Effective experimental design demonstrating CODA's strengths in certain datasets. S3. Clear explanations and logical presentation of the methodology.
W1. Limited dynamic network adaptability compared to some existing methods. W2. Constrained application in model-agnostic learning scenarios. W3. Potential performance decline in handling high-dimensional data sets. W4. Exploration of CODA's effectiveness in diverse concept drift scenarios is insufficient.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Machine Learning and Data Classification
