Standardizing Structural Causal Models
Weronika Ormaniec, Scott Sussex, Lars Lorch, Bernhard Sch\"olkopf,, Andreas Krause

TL;DR
This paper introduces internally-standardized structural causal models (iSCMs) to address artifacts in synthetic data used for benchmarking causal algorithms, offering a more realistic and less identifiable alternative.
Contribution
The paper proposes iSCMs, a new class of SCMs with standardization at each step, reducing certain artifacts and improving causal inference robustness.
Findings
iSCMs are not $ ext{Var}$-sortable.
Empirically, iSCMs are mostly not $ ext{R}^2$-sortable.
Linear iSCMs are less identifiable and avoid collapse to deterministic relationships.
Abstract
Synthetic datasets generated by structural causal models (SCMs) are commonly used for benchmarking causal structure learning algorithms. However, the variances and pairwise correlations in SCM data tend to increase along the causal ordering. Several popular algorithms exploit these artifacts, possibly leading to conclusions that do not generalize to real-world settings. Existing metrics like -sortability and -sortability quantify these patterns, but they do not provide tools to remedy them. To address this, we propose internally-standardized structural causal models (iSCMs), a modification of SCMs that introduces a standardization operation at each variable during the generative process. By construction, iSCMs are not -sortable. We also find empirical evidence that they are mostly not -sortable for…
Peer Reviews
Decision·ICLR 2025 Poster
Causal structure learning methods require synthetic datasets to benchmark against. This work addresses artifacts that vanilla generating methods have. This makes it an interesting and relevant direction of research. The main contribution is the proposed method that is simple and seems to address these artifacts. The experimental evaluation is thorough and the theorems are relevant to the issues with standardized SCMs. The paper is also written well.
Few concerns follow: 1) Most of the theory is for the linear, Gaussian case where while analysis is tractable, most practical situations are non-Gaussian. Therefore, while the partial identifiability result is still of importance, I am not that convinced with the impact of the nonidentifiability result. 2) Even in the experimental evaluation, the nonlinear systems are evaluated on samples from a Gaussian process. Is it possible to introduce nonlinearity in a different way? 3) For a proposal t
1. The paper is very well written, explaining the standardization process clearly. It is interesting to note that the iSCMs are also an SCM. 2. The partial identifiability result of Theorem 3, arguing that for standardized SCMs, given the markov equivalence graph, one can recover the almost entire graph, while iSCMs are robust to such identification (Theorem 4) is an important result. This provides a grounding that iSCMs can be used for benchmarking. 3. The empirical validation of the results
1. The authors approach works for sparse graphs, such as directed acylic forests. It is not entirely clear if the process of normalization (i.e., topological ordering) is the reason why it cannot be extended to graphs beyond forests. 2. It seems that the process of generation each sample is computationally expensive, depends linearly on the depth of the topological ordering. Also it is not clear if it extends for nonlinear causal dependencies. 3. Generalizability Beyond linear models is not cl
1. The paper proposes a simple yet effective approach to avoid $R^2$ -sortability, operating under the assumption that the increase of $R^2$ along the causal order is an artifact. 2. The paper is well-organized, providing both theoretical justifications and experimental results to support its claims.
My main concern is the paper’s treatment of increasing correlation along the causal order as an “artifact.” In reality, this pattern could exist in real-world data. Standardizing variables in the data generation process may, in fact, introduce more “artificiality,” as fewer studies have shown this pattern in real data. As reported in [1], the R²-sortability for a real dataset in Section 4.2 was 0.82. Thus, while iSCMs may serve as a useful supplementary benchmark, they may not be the definitive
Code & Models
Videos
Taxonomy
TopicsQualitative Comparative Analysis Research
MethodsCausal inference
