CODA: Temporal Domain Generalization via Concept Drift Simulator

Chia-Yuan Chang; Yu-Neng Chuang; Zhimeng Jiang; Kwei-Herng Lai; Anxiao; Jiang; Na Zou

arXiv:2310.01508·cs.LG·October 4, 2023·1 cites

CODA: Temporal Domain Generalization via Concept Drift Simulator

Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao, Jiang, Na Zou

PDF

Open Access 3 Reviews

TL;DR

CODA introduces a data-centric, model-agnostic framework that simulates future data using feature correlations to improve temporal domain generalization amidst concept drift in real-world datasets.

Contribution

The paper proposes CODA, a novel framework that uses feature correlation matrices to simulate future data, enabling model-agnostic temporal domain generalization.

Findings

01

CODA effectively simulates future data with feature correlations.

02

Models trained with CODA data generalize better across time.

03

CODA outperforms existing methods in temporal domain adaptation.

Abstract

In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data,…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

1. Overall, the paper is well presented and easy to follow (though some points are still not clear, please see my comments below). 2. This work studies a challenging but still under-studied problem in the literature. 3. The proposed method demonstrates superior performance over several state-of-the-art methods across multiple datasets.

Weaknesses

1. I have concerns about the motivation of this paper. In particular, the authors have emphasized that existing TDG methods are model-centric, which are *unnecessarily comprehensive*, and therefore, TDG should be addressed via a data-centric approach. I doubt this point, as generating samples, in principle, is more challenging than discriminating them. With that said, I am not against the approach itself, but the paper presents it in a way that the data-centric itself is superior to model-centri

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The motivation is quite interesting, and it's meaningful to decompose the concept drift into the data component and model component. 2. The proposed generative method is sound, and the theoretical analysis is also valid. 3. The experiments are well aligned with the three raised research questions.

Weaknesses

The overall paper suggests a novel way to generate out-of-domain temporal data via generative methods. Even though the motivation is great, the major claim of the paper is to solve the temporal domain generalization, and I am not sure how generating new temporal data can help solve the domain generalization. The provided solution still goes back to train a model to get familiar with the data, and leveraging the generated data to fine-tune existing model-centric methods might have a better result

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

S1. Great approach in combining feature correlation prediction with data generation. S2. Effective experimental design demonstrating CODA's strengths in certain datasets. S3. Clear explanations and logical presentation of the methodology.

Weaknesses

W1. Limited dynamic network adaptability compared to some existing methods. W2. Constrained application in model-agnostic learning scenarios. W3. Potential performance decline in handling high-dimensional data sets. W4. Exploration of CODA's effectiveness in diverse concept drift scenarios is insufficient.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Machine Learning and Data Classification