Efficient Causal Structure Learning via Modular Subgraph Integration
Haixiang Sun, Pengchao Tian, Zihan Zhou, Jielei Zhang, Peiyi Li, Andrew L. Liu

TL;DR
VISTA is a modular, voting-based framework for causal structure learning that improves accuracy and efficiency by decomposing the problem, integrating local subgraphs, and ensuring acyclicity, applicable to various models and data types.
Contribution
We introduce VISTA, a novel modular framework for causal discovery that combines subgraph decomposition, voting integration, and theoretical guarantees, enhancing scalability and robustness.
Findings
VISTA outperforms existing methods in accuracy on synthetic datasets.
VISTA significantly reduces computational time in high-dimensional settings.
VISTA demonstrates strong performance on real-world datasets.
Abstract
Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges such as the super-exponential growth of the search space and increasing computational demands. To address this, we introduce VISTA (Voting-based Integration of Subgraph Topologies for Acyclicity), a modular framework that decomposes the global causal structure learning problem into local subgraphs based on Markov Blankets. The global integration is achieved through a weighted voting mechanism that penalizes low-support edges via exponential decay, filters unreliable ones with an adaptive threshold, and ensures acyclicity using a Feedback Arc Set (FAS) algorithm. The framework is model-agnostic, imposing no assumptions on the inductive biases of base learners, is compatible with arbitrary data…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
**Originality and significance:** As the authors acknowledge, the idea of modular causal discovery is relatively well-studied, but existing algorithms are typically non-modular (i.e., they cannot use arbitrary base learners) or have other downsides (e.g. using a non-scalable algorithm for stitching together the subgraphs). The modularity of the proposed framework is nice, and the stitching algorithm is lightweight (though, by necessity, heuristic). Overall, the direction is a promising one for m
**Details of causal structure learning:** Some details about causal discovery seem to be neglected or underemphasized. In particular, I noted two main issues: 1. **Introduction of unobserved confounding:** Taking a subset of nodes from a causal graph introduces unobserved confounding, even for Markov blankets. For example, take the graph with edge $X_1 \to X_2$, $X_1 \to X_3$, and for $k \in \\{1,2,\ldots,K \\}$, the edges $X_2 \to Y_k$ and $X_3 \to Y_k$. Then, for each $k$, the Markov blanket i
1. The theoretical analysis is helpful, which supports the proposed idea. 2. The experimental comparisons are sufficient, verifying the effectiveness of VISTA.
1. The details of learning each subgraph are insufficient. Did the authors use the raw data, or the data consisting of the target variable and its Markov blanket? 2. In part 3 of Section 2, the authors state that “they either assume correct inputs at merging time, …, or perform essentially uncalibrated frequency-based stitching”. Could the authors provide specific examples of existing methods that fall into each of these two categories? In addition, how does the proposed method differ from the
1. Decomposes global DAG learning into node-centered MB subgraphs; plug-and-play with any MB finder and local learner, without adding identifiability/distributional assumptions on bases. 2. The aggregation strategy is efficient and edge-level, performing a one-pass weighted voting instead of relying on expensive global searches or solver-based optimization. It also comes with theoretical guarantees. 3. Experiments demonstrate that VISTA remedies the typical performance drop of base learners, con
1. The goal of many causal discovery tasks is to learn the Markov blanket, yet this approach requires first learning the Markov blanket for each node — a process seems to be somewhat putting the cart before the horse. In general, the divide-and-conquer causal discovery techniques should try to avoid overly strong `divide ` tasks, for example, doing divide based on a learned rough skeleton or structure. For instance, we can first run the PC algorithm to construct a causal graph, then partition th
1. The proposed Weighted Voting aggregation mechanism is novel and elegantly designed, surpassing simple heuristic rules. 2. The experimental validation is comprehensive, systematically demonstrating the superiority of the method across multiple settings. 3. The method is supported by a solid theoretical foundation in addition to its outstanding empirical results.
1. The performance ceiling of the VISTA framework is largely constrained by the accuracy of the Markov Blanket identification in the initial step. If the MB estimation algorithm performs poorly under conditions of data sparsity or extremely high dimensionality, the associated true causal edges can never be recovered in subsequent steps. The paper's discussion on this limitation is somewhat insufficient. Although the framework is modular, a sensitivity analysis concerning its "weakest link" would
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
