Discovering Mixtures of Structural Causal Models from Time Series Data
Sumanth Varambally, Yi-An Ma, Rose Yu

TL;DR
This paper introduces a variational inference framework called MCD for discovering multiple causal models from heterogeneous time series data, extending causal discovery to more realistic, diverse scenarios.
Contribution
It proposes a novel method to identify and separate different causal models within mixed time series data, including linear and nonlinear cases, with theoretical guarantees.
Findings
Outperforms existing methods on synthetic datasets.
Effective in real-world datasets with heterogeneous causal structures.
Proven identifiability under mild assumptions.
Abstract
Discovering causal relationships from time series data is significant in fields such as finance, climate science, and neuroscience. However, contemporary techniques rely on the simplifying assumption that data originates from the same causal model, while in practice, data is heterogeneous and can stem from different causal models. In this work, we relax this assumption and perform causal discovery from time series data originating from a mixture of causal models. We propose a general variational inference-based framework called MCD to infer the underlying causal models as well as the mixing probability of each sample. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood. We present two variants: MCD-Linear for linear relationships and independent noise, and MCD-Nonlinear for nonlinear causal relationships and…
Peer Reviews
Decision·ICML 2024 Poster
- in general, exposition is good (although there is room for clarity and explanations in formal parts ). - addressing mixture of multiple-causal graph discovery is an interesting/relevant direction of research, that hasn't been much investigated (although I have my reservations) - experimental results are coupled with a theoretical structural identifiability result.
- my main skepticism is due to the fact that all the results are for the training set, which I am surprised (and had a stronger positive impression until that point). Trivial enough: For an unquestionable positive score, I would rather need result on test data, on unseen graphs. - How different the causal graphs in these mixture models (and also the data i.e., average SHD within the cluster is missing. This is really important to really understand what is going on behaviour of the method. (
The paper addresses a relevant and interesting aspect of causal discovery in time-series data. The proposed loss function is a simple extension of the standard variational inference objective, enabling efficient end-to-end training with a mixture of core causal discovery models. The method is flexible in terms of the choice of core causal structure learning algorithms, inheriting the structural identifiability properties of these algorithms. Competitive empirical results (although on training
The proposed objective function can be seen as a straightforward extension of the standard variational inference optimisation framework. Therefore, the overall novelty and significance of the work may be somewhat limited in this regard. A major drawback of the work is that the reported results are based on training data, as the proposed method depends on learnable sample-specific parameters. It raises questions about why an encoder producing a K-way categorical random variable given a sample wa
The problem of identifying the different SCMs in a mixture model where each mixed dataset uses a different graph and parameterization is extremely important, so I’m glad it’s being addressed.
I do have some issues. 1. If we turn to the experimental section of the paper, we get two examples: one for NetSim (which I’m very familiar with) and another for the DREAM3 gene network. Both of these have problems with respect to the goal of this paper, suggesting that the choice of experimental datasets could be improved. 2. The problem with the NetSim data is that it’s not an extremely convincing time series, as the records in the simulation are spaced far enough apart in time to render the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)
