Linear Causal Discovery with Interventional Constraints

Zhigao Guo; Feng Dong

PMC · DOI:10.1007/s10994-026-06998-z·February 17, 2026

Linear Causal Discovery with Interventional Constraints

Zhigao Guo, Feng Dong

PDF

Open Access

TL;DR

This paper introduces interventional constraints to improve causal discovery by encoding known causal relationships as inequality constraints, leading to more accurate and explainable models.

Contribution

The novel concept of interventional constraints allows encoding high-level causal knowledge as inequality constraints on causal effects.

Findings

01

Integrating interventional constraints improves model accuracy and consistency with established findings.

02

The method facilitates the discovery of new causal relationships at lower cost.

03

The approach is evaluated on real-world datasets and shows promising results.

Abstract

Incorporating causal knowledge and mechanisms is essential for refining causal models and improving downstream tasks, such as designing new treatments. In this paper, we introduce a novel concept in causal discovery, termed interventional constraints, which differs fundamentally from interventional data. While interventional data require direct perturbations of variables, interventional constraints encode high-level causal knowledge in the form of inequality constraints on causal effects. For instance, in the Sachs dataset, Akt has been shown to be activated by PIP3, meaning PIP3 exerts a positive causal effect on Akt. Existing causal discovery methods allow enforcing structural constraints (e.g., requiring a causal path from PIP3 to Akt), but they may still produce incorrect causal conclusions, such as learning that “PIP3 inhibits Akt.” Interventional constraints bridge this gap by…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes21

PDK1 AKT1 MAP2K1 HSPG2 PIP MAP2K4 MAP2K7 ZHX2 PTEN MAP2K2 PDE4A MAPK8 MAPK1 PRRT2 CALM3 YWHAQ F2 PIK3CB RAF1 EGFR MAPK3

Proteins22

Species1

Homo sapiens(human · species)

Chemicals11

PIP2 GraN H89 phospholipids GraN-DAG propranolol phosphatidylinositol-3,4,5-trisphosphate IBMX forskolin W LPS

Diseases5

cancer lung cancer SID T CD

Figures8

Click any figure to enlarge with its caption.

From left to right: a true causal model, b causal model learned without interventional constraints, c causal model learned with interventional constraints

Sachs causal models learned by NOTEARS (without constraints) and Lin-CD-Path (with path constraints) Fig. 4Sachs causal models learned by Lin-CDIC (with interventional constraints) under different $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values

Funding1

—https://doi.org/10.13039/501100000266Engineering and Physical Sciences Research Council

Keywords

Causal discoveryCausal inferenceCausal effectPrior knowledgeContinuous optimization

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Causal Inference Techniques · Explainable Artificial Intelligence (XAI)

Full text

Introduction

Understanding causality is crucial for developing explainable, safe, fair, and robust machine learning models that generalize well to new environments (Kaddour et al., 2025; Pearl, 2018; Sanchez et al., 2022). Causal discovery, the task of recovering the underlying causal graph from data, is an essential component of causal inference to uncover true causal mechanisms rather than mere statistical associations (Glymour et al., 2019; Kitson et al., 2023). Methods for causal discovery may in principle operate on interventional data, where variables are experimentally manipulated (Pearl, 2009), or on observational data, where variables are passively collected, such as PC (Spirtes & Glymour, 1991), GES (Chickering & Meek, 2002), LiNGAM (Chickering & Meek, 2002; Shimizu et al., 2006; Spirtes & Glymour, 1991; Zheng et al., 2018), and NOTEARS (Shimizu et al., 2006) (Zheng et al., 2018), or on a mixture of both, such as GIES (Hauser & Bühlmann, 2012), JCI (Mooij et al., 2020), DCDI (Brouillard et al., 2020), and DiffAN (Sanchez et al., 2023). Interventional data, while very powerful, are often expensive, time-consuming, or ethically restricted, particularly in biomedical domains (Feuerriegel et al., 2024). For this reason, many modern approaches, including NOTEARS (Zheng et al., 2018) and GraN-DAG (Lachapelle et al., 2020), explicitly assume access to only observational data, reflecting common real-world constraints. However, purely observational, data-driven approaches struggle in practice and can lead to spurious edges or missed causal relationships in the learned graph. These limitations motivate the need for additional sources of information beyond observational data alone.

In many scientific fields, domain experts possess valuable high-level knowledge about causal influences, which has been shown to improve both accuracy and interpretability when incorporated into causal discovery (Constantinou et al., 2023). Importantly, this type of knowledge does not require fine-grained interventional datasets; instead, it consists of coarse but powerful statements such as “A activates B” or “C inhibits D”—knowledge that is yet underutilised in existing frameworks. Motivated by this, our work investigates whether causal discovery can be enhanced by combining observational data with high-level qualitative causal knowledge.

Existing work has primarily incorporated prior knowledge through structural constraints, such as forbidding or enforcing edges, paths, or variable orderings (Constantinou et al., 2023). These approaches operate solely at the level of graph topology and do not constrain how strongly, or in what direction, variables influence one another. However, expert knowledge in many scientific domains is often expressed directly in terms of the signs or magnitudes of causal effects. To capture this richer and more practically meaningful form of knowledge, we introduce interventional constraints, a previously unexplored class of qualitative constraints that restrict both the admissible causal pathways and their associated total causal effects. Such constraints improve interpretability and enhance downstream causal inference tasks by ensuring that the learned model is consistent with established causal influences.

To illustrate the concept of interventional constraints, consider the widely used Sachs dataset (Sachs et al., 2005) describing a signalling pathway in human immune cells. Biological experiments establish that PIP3 activates Akt, meaning that PIP3 exerts a positive causal effect on Akt. Such knowledge can serve as a testable constraint (Jewell et al., 2016, p. 64) and be formulated as an interventional constraint. Hence, if a causal model predicts that PIP3 inhibits Akt, it would violate the interventional constraint and contradict established evidence, even if the model includes a causal path from PIP3 to Akt.

Many scientific domains already possess such qualitative causal knowledge accumulated through decades of empirical studies, mechanistic investigations, and curated knowledge bases. For example, smoking is known to increase the risk of lung cancer, and tax reductions often exert a positive causal effect on consumer spending—more details are provided in Sect. 3.1. Unlike interventional datasets that require actively perturbing variables, interventional constraints provide a practical and scalable approach for enhancing causal discovery in observational-only settings. Such high-level information is naturally more abundant in some areas (e.g., biology, medicine, economics) than in others. Our aim is to offer a principled framework for incorporating this knowledge when it is available. In this sense, interventional constraints should be viewed as a complementary source of information that enhances causal discovery precisely in domains where expert causal knowledge exists but interventional datasets are limited, costly, or impractical to obtain.

The main contributions of this paper are as follows:

We introduce causal discovery with a new type of constraint, termed interventional constraints to incorporate qualitative knowledge of causal effects into the learning process. Unlike existing constraints that mainly affect a model’s structure, the interventional constraints regulate both the causal pathways (structure) and the causal effects (parameters) of the model.
We adopt the standard notion of total causal effects in linear SEMs to express interventional constraints directly on effect strengths between variables. This use of total effects enables the formulation of interventional constraints as inequality conditions on total causal effects, allowing human knowledge to influence both the structure and parameters of the learned causal model.
We present a tailored two-stage mixed optimization approach to solve the problem of causal discovery with interventional constraints under the linear assumption.
We validate the proposed method on both synthetic and real-world data. Experiments on synthetic data demonstrate that interventional constraints are more effective than traditional path constraints. Real-world experiments further show that partial interventional constraints enable the identification of additional causal interactions (e.g., “PKA inhibits P38”) and causal paths (e.g., Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Erk). Remark: Within this paper, we focus on demonstrating causal discovery with interventional constraints in the linear setting, the underlying concept of interventional constraints is general and can, in principle, be extended to nonlinear settings - a direction we identify as promising for future research. Hence this work serves as a preliminary step toward more general integrations of such knowledge. This is similar in spirit to the development of LiNGAM (Shimizu et al., 2006) and NOTEARS (Zheng et al., 2018), which began with linear models and later inspired extensions to nonlinear frameworks. Our goal is to lay a foundation for future research extending interventional constraints to more complex, nonlinear scenarios.

Related Work

Various approaches have been developed to integrate human or prior knowledge through structural constraints, including node ordering (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1 \prec X_3 \prec X_2$$\end{document}$ ), edge constraints (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1 \rightarrow X_2$$\end{document}$ ), path constraints (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1 \rightarrow \dots \rightarrow X_2$$\end{document}$ ) and expert-provided structure information. Early methods, such as K2 algorithm Cooper and Herskovits (1992), relied on predefined node ordering for Bayesian network structure learning. Subsequent works expanded on this by integrating multiple prior constraints, as seen in Inazumi et al. (2010), which enhanced LiNGAM-based causal discovery by incoporation of path constraints. More interactive approaches, such as those by Meek (1995), Cano et al. (2011) and Masegosa and Moral (2013), allowed for the incorporation of edges, path constraints and certain required edge orientations, enabling more flexible structure learning. Recent advancements have focused on refining structural priors and integrating domain knowledge in a more systematic manner. Perković et al. (2017) proposed a method for incorporating edge orientations and partial ordering constraints into maximally oriented partially directed acyclic graphs (maximal PDAGs) learning, while Andrews et al. (2020) introduced tiered causal ordering into the FCI algorithm. Hasan and Gani (2022) utilized reinforcement learning to penalize edge constraint violations, thereby enforcing known causal relationships. Other works have leveraged approximate causal structures as priors. For instance, Geffner et al. (2024) utilized completed partially directed acyclic graph (CPDAG) from the PC algorithm, while Choo et al. (2023) employed approximate DAGs obtained from expert input. In a more general framework, Constantinou et al. (2023) proposed integrating various structural priors into Bayesian network structure learning, demonstrating the impact of domain knowledge on causal structure learning. Their work aligns with efforts such as Rittel and Tschiatschek (2023), who developed differentiable Bayesian models incorporating expert-specified edges and node ordering constraints. Several recent approaches incorporate edge constraints into continuous optimization frameworks. Sun et al. (2023) framed dynamic Bayesian network (DBN) structure learning as a continuous optimization problem incorporating edge constraints from one-dimensional convolutional neural networks (1D CNNs). Similarly, Maeda and Shimizu (2024) integrated exclusion and temporal ordering constraints to improve causal additive model identification. Wang et al. (2024) further extended this paradigm by integrating edge, path, and ordering constraints into differential causal discovery. Existing research on incorporating prior knowledge into causal discovery is summarized in Table 1.Table 1. Related work on incorporating prior knowledge in causal discoveryReferencePrior typeComments Cooper and Herskovits (1992)Node orderingPioneered predefined variable ordering for discrete Bayesian networks structure learning Meek (1995)Edge orientationsIdentifies causal relations shared by all DAGs consistent with data and background knowledge Inazumi et al. (2010)Path constraintsEnhances LiNGAM with path constraints for improved linear causal structure identification Cano et al. (2011), Masegosa and Moral (2013)Edge and path constraintsEnables interactive prior knowledge integration for structure learning Perković et al. (2017)Edge orientations, Markov equivalence, partial orderingIntegrates prior to learn maximal PDAG Andrews et al. (2020)Tiered causal orderingIntegrates tiered causal ordering into FCI Hasan and Gani (2022)Edge constraintsUses prior knowledge in reinforcement learning to penalize constraint-violating causal structures Geffner et al. (2024)CPDAG learned by the PC algorithmLeverages CPDAG and domain knowledge to enhance causal recovery Rittel and Tschiatschek (2023)Edge and ordering constraintsRefines DAG priors in a differentiable Bayesian framework to integrate expert-provided edges or node ordering constraints Constantinou et al. (2023)Various structural priorsIntegrates comprehensive structural priors into Bayesian network structure learning Choo et al. (2023)Approximate DAG from expertsUtilizes an approximate DAG as prior knowledge for robust causal structure recovery Sun et al. (2023)Edge constraintsFrames DBN structure learning as continuous optimization with edge constraints from 1D CNNs Maeda and Shimizu (2024)Exclusion and temporal orderingIntegrates prior knowledge to enhance causal additive model identification Wang et al. (2024)Edge, path and ordering constraintsIncorporates edge, path, and ordering priors into differential causal discovery

Interventional Constraints

This section introduces the novel concept of interventional constraints, a new form of high-level causal knowledge that expresses the expected direction and strength of causal effects between variable pairs. We formally define these constraints and demonstrate how they can be incorporated into linear causal discovery, where causal effects are explicitly represented by edge weights and total effects along causal paths.

Definition

Definition 3.1

(Interventional Constraints) Let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{i,j}$$\end{document}$ be the total causal effect of variable $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document}$ on variable $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document}$ . Interventional constraints specify whether this effect is positive or negative, such that $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{i,j}> 0$$\end{document}$ indicates a positive effect, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{i,j} < 0$$\end{document}$ indicates a negative effect.

Remark: Note that our interventional constraints are qualitative and expressed as inequalities (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{i,j}> 0$$\end{document}$ ), differing from the fine-grained quantitative interventional data. Unlike methods assuming direct experimental interventions (Brouillard et al., 2020; Hauser & Bühlmann, 2012; Ke et al., 2023; Lippe et al., 2022), our approach uses qualitative expert knowledge. Such constraints may originate not only from randomized controlled trials but also from broader domain evidence. For example, as Judea Pearl noted: “Consider the century-old debate concerning the effect of smoking on lung cancer. In 1964, the Surgeon General issued a report linking cigarette smoking to death, cancer, and most particularly lung cancer. The report was based on nonexperimental studies in which a strong correlation was found between smoking and lung cancer, and the claim was that the correlation found is causal: If we ban smoking, then the rate of cancer cases will be roughly the same as the one we find today among nonsmokers in the population.” (Pearl,2009, p. 423). This assertion can be represented as an interventional constraint in our framework, expressed as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Smoking}, \text {Lung cancer})> 0$$\end{document}$ . These constraints are significantly easier to specify compared to the detailed numerical values typically required in interventional datasets. Similarly, in the Sachs dataset (Sachs et al., 2005), where prior biological knowledge indicates that PIP3 activates Akt [i.e., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ ] (Reactome: R-HSA-1257604), implying that PIP3 has a positive causal effect on Akt. Traditional causal discovery might reveal a causal path from PIP3 to Akt but not guarantee its sign. In contrast, our method enforces consistency with such known effects without requiring detailed numerical interventional data.

Linear Causal Discovery with Interventional Constraints

We consider causal discovery under the standard assumptions used in linear structural equation models:

Causal Sufficiency: All common causes of observed variables are included in the model, so there are no unmeasured confounders.
Causal Markov Condition: Each variable is conditionally independent of its non-descendants given its parents, allowing the joint distribution to factorize according to the DAG.
Faithfulness: All conditional independencies in the observed data correspond to d-separation relations in the true causal DAG.
Linearity and Additive Gaussian Noise: Each variable is generated as a linear function of its parents, with an independent additive Gaussian noise term. The noise variances are assumed to be unequal or unknown. In a linear causal model, each variable $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document}$ is a linear function of its direct causes $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Pa}(X_i)$$\end{document}$ plus an independent additive noise term $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_i$$\end{document}$ :

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} X_i = \sum _{X_j \in \text {Pa}(X_i)} w_{ij} X_j + z_i, \quad i = 1, 2, \dots , d, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij}$$\end{document}$ denotes the direct causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z_i$$\end{document}$ are mutually independent Gaussian noise terms with unequal (or unknown) variances. These weights form a weighted adjacency matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W \in {\mathbb {R}}^{d \times d}$$\end{document}$ , and the overall objective of causal discovery is to recover W) from observed data $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X \in {\mathbb {R}}^{n \times d}$$\end{document}$ . We adopt the continuous optimization framework of NOTEARS (Zheng et al., 2018), where the estimation of W is formulated as the following optimization problem:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \min _{W \in {\mathbb {R}}^{d \times d}} F(W) \end{aligned}$$\end{document}

subject to

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \delta _{ij} (T_{ij} - \delta _{ij})> 0, \quad i \in {\mathcal {C}}, j \in {\mathcal {T}}, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \text {h}(W) = 0, \end{aligned}$$\end{document}

where the objective function is defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} F(W) = \frac{1}{2n} \Vert X - XW\Vert _F^2 + \lambda \Vert W\Vert _1, \end{aligned}$$\end{document}

and the acyclicity constraint is imposed via

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {h}(W) = \text {tr}\left( e^{W \circ W}\right) - d. \end{aligned}$$\end{document}

Here, the Frobenius norm penalizes prediction error, the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}$ norm encourages sparsity, and the exponential trace constraint enforces DAG-ness. The main addition beyond traditional causal discovery is the new interventional constraint in Eqn. (3), which encodes prior knowledge about causal effects through a lower-bound inequality on the total effect matrix T. To encode expert knowledge, we impose:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}(T_{ij} - \delta _{ij})> 0,$$\end{document}

which ensures that the total causal effect $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}$$\end{document}$ exceeds threshold $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}$$\end{document}$ in magnitude and matches its sign. For instance, if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij} = 0.1$$\end{document}$ , then $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}> 0.1$$\end{document}$ ; if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij} = - 0.1$$\end{document}$ , then $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij} < - 0.1$$\end{document}$ . The above constrained formulation is novel in jointly enforcing both acyclicity (via nonlinear equality) and interventional knowledge (via nonlinear inequality). Together, these constraints regulate both structure and parameters, distinguishing our method from prior work which only considers structural constraints.

Remark: For linear-Gaussian models with unequal (or unknown) noise variances, causal discovery is limited to identifying the Markov equivalence class (Glymour et al., 2019; Peters & Bühlmann, 2014; Shimizu et al., 2006; Verma & Pearl, 1990). Introducing qualitative interventional constraints-expressed as inequality conditions on total causal effects-can help resolve causal directions by penalizing models that contradict known effect signs. However, we emphasize that the key novelty of our work does not lie in altering identifiability assumptions, but in proposing interventional constraints as a new form of knowledge-driven guidance, which directly imposes inequality constraints on total causal effects between variables.

For linear causal models, we have the following proposition to measure the total causal effect matrix below, which captures both direct and indirect causal effects between variables.

Proposition 3.1

(Total Causal Effects in Linear Models) In a linear causal model, the matrix T encapsulates total causal effects (both direct and indirect) between variable pairs:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} T = (I - W)^{-1} - I. \end{aligned}$$\end{document}

Proof

Under the linear structural equations given in Eqn. (1), each entry $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{ij}$$\end{document}$ represents the direct causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_j$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_i$$\end{document}$ , while $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(W^{k})_{ij}$$\end{document}$ represents the cumulative effect of all directed paths of length k from j to i. Thus, the total causal effect from j to i is naturally defined as the sum over all possible path lengths:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = W + W^{2} + \cdots + W^{d-1}$$\end{document}

where d is the number of variables. Because the graph is acyclic, the adjacency matrix W can be topologically ordered to become strictly upper triangular, which implies $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{d} = 0$$\end{document}$ . Hence, the Neumann (geometric) series terminates, yielding

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1} = I + W + W^{2} + \cdots + W^{d-1}$$\end{document}

Subtracting the identity matrix removes trivial self-effects, giving

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = (I - W)^{-1} - I$$\end{document}

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}$

Remark: The expression $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1}$$\end{document}$ is well known in linear causal models and has been used in prior literature, including (Gische & Voelkle, 2022; Guo & Perković, 2022; Ni et al., 2025; Tian, 2004) . While total causal effects are conceptually defined through the geometric series $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W + W^{2} + \cdots$$\end{document}$ , the compact expression $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1} - I$$\end{document}$ offers practical advantages. For acyclic graphs, the Neumann series terminates after at most $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d-1$$\end{document}$ terms, and the inverse form provides an efficient and numerically stable way to compute total effects while also yielding convenient analytic gradients for optimization.

Remark: While interventional constraints are introduced here in the context of linear models, they are conceptually general and can be adapted to nonlinear settings. In such cases, total causal effects would be estimated through path-specific derivatives or interventional distributions, though practical implementation would require further research.

To facilitate the explanation of the causal effect matrix T, we provide an illustrative example for T. Consider a causal model with three variables $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ influences $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ influences $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ . The matrix W is represented as follows:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W = \begin{pmatrix} 0 & w_{12} & 0 \\ 0 & 0 & w_{23} \\ 0 & 0 & 0 \end{pmatrix}.$$\end{document}

Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{12}$$\end{document}$ is the direct causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{23}$$\end{document}$ is the direct causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ . The total effect matrix T would include not just these direct causal effects but also the indirect causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ through $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ . Visually, this could be represented as:

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1 \rightarrow X_2 \rightarrow X_3$$\end{document}$ .

In this case, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{13}$$\end{document}$ captures the indirect causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ through $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_2$$\end{document}$ , which is not captured by the matrix W alone. To compute the total causal effect matrix T, we follow Eqn. (7) and proceed step by step: first, we calculate $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I - W$$\end{document}$ :

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I - W = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} - \begin{pmatrix} 0 & w_{12} & 0 \\ 0 & 0 & w_{23} \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 1 & -w_{12} & 0 \\ 0 & 1 & -w_{23} \\ 0 & 0 & 1 \end{pmatrix}.$$\end{document}

Next, we compute $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1}$$\end{document}$ :

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1} = I + W + W^2 = \begin{pmatrix} 1 & w_{12} & w_{12}w_{23} \\ 0 & 1 & w_{23} \\ 0 & 0 & 1 \end{pmatrix}.$$\end{document}

Finally, we subtract the identity matrix I from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1}$$\end{document}$ to obtain T:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = \begin{pmatrix} 1 & w_{12} & w_{12}w_{23} \\ 0 & 1 & w_{23} \\ 0 & 0 & 1 \end{pmatrix} - \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 0 & w_{12} & w_{12}w_{23} \\ 0 & 0 & w_{23} \\ 0 & 0 & 0 \end{pmatrix}.$$\end{document}

Thus, the matrix T captures both the direct causal effects $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{12}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{23}$$\end{document}$ , as well as the indirect causal effect of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_1$$\end{document}$ on $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_3$$\end{document}$ , which is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{12}w_{23}$$\end{document}$ .

Two-stage Constrained Optimization

We propose a two-stage optimization strategy to solve the causal discovery problem under both acyclicity and interventional constraints. The optimization problem is highly non-convex due to the interplay between structural and parametric constraints. To address this, we propose a practical two-stage constrained optimization approach that combines L-BFGS with sequential least squares programming (SLSQP).

Overview of the Optimization Problem

In our problem, the Frobenius norm term $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{2n} \Vert X - XW\Vert _F^2$$\end{document}$ is a quadratic function in W, and since the trace of a quadratic form is convex, this term is convex. The $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}$ norm $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda \Vert W\Vert _1$$\end{document}$ is also convex. Therefore, the objective function F(W) is convex, as it is a sum of convex functions. However, the causal effect constraints $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}\,\bigl (T_{i,j} - \delta _{ij}\bigr )> 0$$\end{document}$ involve the inverse $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I - W)^{-1}$$\end{document}$ , a non-convex operation. Therefore, these causal effect constraints are non-convex. Additionally, the acyclicity constraint $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {tr}(e^{W \circ W}) - d = 0$$\end{document}$ involves an element-wise exponential function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e^{W \circ W}$$\end{document}$ , which is convex. The condition that the trace of this matrix minus a constant equals zero is a typically non-convex equality constraint. As a result, although the objective function F(W) is convex, the constraints involving the matrix T and the acyclicity condition introduce non-convexity, making the overall optimization problem defined by Eqns. (2–6) a non-convex problem. Furthermore, there are intrinsic tensions between the acyclicity constraint and the interventional constraints, manifested in three key ways: First, negative elements in the weight matrix W to not affect h(W) because the Hadamard product $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W \circ W$$\end{document}$ involves squaring the elements of W, which converts all negative values to positive values. Consequently, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W \circ W$$\end{document}$ is always non-negative, ensuring that the matrix exponential $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e^{W \circ W}$$\end{document}$ and its trace are non-negative. Therefore, the value of h(W) is not directly influenced by whether the elements of W are negative or positive. However, negativity of elements in the weight matrix W can impact the causal effect between variables, thus deciding violation of interventional constraints. Second, magnitude of elements in the weight matrix has different impact on acyclicity constraints h(W) and interventional constraints. acyclicity constraints encourage lower values in the weight matrix, while interventional constraints increase the value of elements in weight matrix. Third, acyclicity constraints encourage a sparse graph, while interventional constraints promote a less sparse graph, depending on the number of interventional constraints and the magnitude of the relevant thresholds $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta$$\end{document}$ . For all these reasons, the overall optimization problem defined by Eqns. (2–6) is not only non-convex but also highly non-convex, making standard optimizers such as L-BFGS insufficient and unreliable for handling the full set of constraints. Therefore, we adopt the sequential least squares programming (SLSQP) method (Kraft, 1988), which supports general nonlinear constraints and provides a practical and effective solution for our setting. Given that the SLSQP method is gradient-based, it is essential to compute the gradients of both the objective function F(W) and the constraints. The gradient of the Frobenius norm squared term is:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla _W \left( \frac{1}{2n}\Vert X - XW\Vert _F^2 \right) = \frac{1}{n} X^T(XW - X) \end{aligned}$$\end{document}

and the gradient of the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_{1}$$\end{document}$ norm is:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla _W \Vert W\Vert _1 = \text {sign}(W), \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {sign}(W)$$\end{document}$ is applied element-wise. The full gradient of the objective function F(W) is then:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla F(W) = \frac{1}{n} X^T(XW - X) + \lambda \text {sign}(W). \end{aligned}$$\end{document}

The gradient of the causal effect measure T is:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla _W T = -(I - W)^{-1} \otimes (I - W)^{-1}. \end{aligned}$$\end{document}

The gradient of the acyclicity measure $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(W)$$\end{document}$ is:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla _W h(W) = 2 \cdot \text {diag}(e^{W \circ W}) \cdot (W \circ W) \cdot W. \end{aligned}$$\end{document}

The SLSQP method approximates the problem locally by a quadratic model of the objective function and a linear model of the constraints:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \min _{\Delta W} \left( \nabla F(W)^T \Delta W + \frac{1}{2} \Delta W^T H \Delta W \right) \end{aligned}$$\end{document}

subject to

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} A \Delta W = b - c, \end{aligned}$$\end{document}

where H is an approximation to the Hessian of F(W). A represents the Jacobians of the interventional and acyclicity constraints from Eqns. (11, 12). $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b - c$$\end{document}$ represents the amount by which the current constraint values deviate from their desired target values, helping to define the feasible region. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta W$$\end{document}$ is the step direction, representing the change in W that minimizes the objective function (Eqn. 13) while satisfying the constraints (Eqn. 14). Using the step direction $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta W$$\end{document}$ found from solving the quadratic subproblem defined by Eqns. (13, 14), the weights are updated as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} W \leftarrow W + \alpha \Delta W, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}$ is the step size determined by a line search.

The SLSQP algorithm starts with an initial weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ and computes the objective function and Jacobians. In the main loop, it iteratively solves a quadratic subproblem to find the step direction $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta W$$\end{document}$ , updating the weight matrix to minimize the objective function while meeting constraints. After each iteration, the algorithm updates W, checks for convergence based on the tolerance tol, and stops if the change in W is small enough or if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$max\_iter$$\end{document}$ is reached. The matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{est}$$\end{document}$ is returned as the output. The detailed procedure of SLSQP optimization is outlined in Algorithm 3. In this paper, the maximum number of iterations, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$max\_iter$$\end{document}$ , is set to 10,000, and the tolerance, tol, is set to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 10^{-6}$$\end{document}$ . The bounds on the entries of the weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {B}}$$\end{document}$ are defined as follows:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathcal {B}} = \left\{ \begin{array}{ll} (0, 0) & \text {for } i = j, \\ (-\infty , \infty ) & \text {for } i \ne j, \end{array} \right. \quad i, j \in \{1, 2, \dots , d\}. \end{aligned}$$\end{document}

In other words, the diagonal entries (where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = j$$\end{document}$ ) are constrained to be 0, while the off-diagonal entries (where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i \ne j$$\end{document}$ ) are unbounded.

Once SLSQP produces an estimated weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {est}}$$\end{document}$ , entries whose absolute values are below $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega$$\end{document}$ are set to zero, making the matrix sparse. However, the estimated weight matrix that satisfies both acyclicity and interventional constraints before thresholding may still fail to fully meet these constraints after thresholding, particularly the interventional constraints. This occurs because thresholding can make the weight matrix sparse, thereby disconnecting parts of the causal edges. Consequently, thresholding may sever causal paths between cause and target variables or weaken their causal strength, leading to violations of some interventional constraints. To address this, one can increase the thresholds $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}$$\end{document}$ in the constrained optimization step for any interventional constraints found to be violated post-thresholding. For instance, if variable i is known to have a positive causal effect on variable j, the corresponding constraint is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}\,\bigl (T_{i,j} - \delta _{ij}\bigr )> 0$$\end{document}$ with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}$$\end{document}$ initially set to be a small positive value (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}=0.01$$\end{document}$ ). If the constraint $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}\,\bigl (T_{i,j} - \delta _{ij}\bigr )> 0$$\end{document}$ is satisfied before thresholding but violated after thresholding, we re-optimize with modified deltas as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij} \leftarrow \delta _{ij} + \epsilon , \epsilon> 0$$\end{document}$ . See Appendix D for details on how to choose $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ . Note that a larger $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}$$\end{document}$ can substantially change the learned model, as a larger $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _{ij}$$\end{document}$ imposes stricter constraints that force the model to retain or strengthen more connections. In high-dimensional settings, interventional constraints are also more likely to be violated by thresholding, since longer and more complex causal paths mean that removing any edge can disrupt global causal paths and causal effects between variables.

Two-stage Constrained Optimization

The SLSQP method is sensitive to the initial guess, specifically the starting weight matrix, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ , which is particularly problematic in non-convex spaces. Thus, a robust approach is required to ensure convergence to a feasible solution. To address this, we propose a straightforward two-stage constrained optimization approach:

Stage One (Optimization without interventional constraints): Initially, the efficient gradient-based L-BFGS algorithm (Zheng et al., 2018) is employed to learn a weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ that satisfies the acyclicity constraint, providing an initial approximation for the subsequent continuous optimization that further incorporates interventional constraints. L-BFGS is a limited-memory quasi-Newton method designed for large-scale optimisation with simple bound constraints (Byrd et al., 1995; Zhu et al., 1997). It approximates the inverse Hessian using only a small number of correction pairs, enabling efficient second-order updates even in high-dimensional problems. Compared with first-order methods, L-BFGS typically converges faster and yields more stable solutions, making it particularly suitable for learning continuous DAG models such as NOTEARS. In our framework, Stage One employs L-BFGS to minimize the smooth NOTEARS objective $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F(W)$$\end{document}$ under the acyclicity constraint—implemented by fixing the diagonal of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W$$\end{document}$ and enforcing $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h(W) = 0$$\end{document}$ . This step yields a numerically stable and computationally efficient initial estimate that satisfies DAG-ness. Although Stage One does not incorporate interventional constraints, it provides a high-quality initialization that substantially improves the reliability and convergence behaviour of the subsequent SLSQP refinement, which must simultaneously handle both nonlinear equality (acyclicity) and nonlinear inequality (interventional) constraints.

Stage Two (Optimization with interventional constraints): The weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ is then used as the initial guess for the SLSQP optimization. In this stage, the objective is to iteratively refine the solution to further satisfy the interventional constraints. These interventional constraints are addressed sequentially, ensuring that the solution converges to a feasible and optimal $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document}$ .

Our overall two-stage constrained optimization method, linear causal discovery with interventional constraints (Lin-CDIC), is summarized in Algorithm 1.

Algorithm 1Lin-CDIC Algorithm

Convergence Analysis

Proposition 4.1

(Convergence of the Two-stage Optimization) The solution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document}$ obtained by the two-stage optimization method is a Karush-Kuhn-Tucker (KKT) point of the problem defined by Eqns. (2–5).

Proof

: In Stage One, since F is twice continuously differentiable, L-BFGS converges to a stationary point, satisfying

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla F(W^{(1)}) + \rho \nabla h(W^{(1)}) = 0. \end{aligned}$$\end{document}

However, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ may satisfy the acyclicity constraint but not the interventional constraints. In Stage Two, using $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ as the initialization, SLSQP, by sequential quadratic programming, iteratively updates W, producing a sequence $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(k)} \rightarrow W^*$$\end{document}$ . As F, h, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}$$\end{document}$ are continuously differentiable and the constraint qualification holds in the feasible region, by the theory of constrained optimization (Nocedal & Wright, 2006), the limit point $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document}$ satisfies the following Karush-Kuhn-Tucker (KKT) conditions. Specifically, there exist Lagrange multipliers $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu \in {\mathbb {R}}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _{ij} \ge 0$$\end{document}$ such that

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \nabla F(W^*) + \mu ^T \nabla h(W^*) + \sum _{(i, j)} \lambda _{ij} \delta _{ij} \nabla T_{ij}(W^*) = 0, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & h(W^*) = 0, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \delta _{ij}(T_{ij}(W^*) - \delta _{ij})> 0, \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \lambda _{ij} \cdot [\delta _{ij}(T_{ij}(W^*) - \delta _{ij})] = 0, \quad \forall (i, j). \end{aligned}$$\end{document}

Therefore, the solution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document}$ obtained by the two-stage optimization method is a KKT point of the original constrained problem (but is not necessarily a global optimum). $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}$

The two-stage approach progressively refines the solution by breaking the optimization process into manageable steps. In the first stage, an initial feasible solution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{(1)}$$\end{document}$ is obtained that satisfies acyclicity constraint, providing a solid foundation for further refinement, even though it does not yet meet all constraints. This ensures that subsequent optimizations focus on fine-tuning rather than large-scale corrections. In the second stage, the solution is incrementally improved, moving towards the optimal weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{*}$$\end{document}$ that satisfies both the acyclicity and interventional constraints. This step-by-step refinement preserves feasibility while progressively approaching the optimal solution.

Time Complexity

The Lin-CDIC method involves two sequential optimization stages: first, an L-BFGS-B gradient-based method, and then SLSQP. The overall computational complexity depends on the number of nodes d, the number of interventional constraints m, and the nature of the optimization algorithms used. In the first stage, the time complexity is primarily driven by the number of nodes d and the complexity of the underlying gradient-based optimization, which is generally $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(d^3)$$\end{document}$ due to the matrix operations involved in enforcing the acyclicity constraint. In the second stage, since each constraint is addressed sequentially, the complexity is linear with respect to the number of interventional constraints, denoted as m. Thus, the overall time complexity for this stage can be approximated as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(m \cdot T_{\text {SLSQP}})$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{\text {SLSQP}}$$\end{document}$ is the time complexity of a single SLSQP iteration, which itself depends on the problem size d and can range from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(d^2)$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(d^3)$$\end{document}$ . Combining both stages, the overall time complexity of the batch-constrained optimization method is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(d^3) + O(m \cdot T_{\text {SLSQP}})$$\end{document}$ , upper bounded by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(m+1)O(d^3)$$\end{document}$ . Since m can be large in practical applications, the method’s time complexity is effectively linear with respect to m.

An Illustrative Example for the Problem and Algorithm

To illustrate the difference between models learned with and without interventional constraints, we provide an example of a linear causal model with 10 variables. We generated data with a sample size of 100 and four interventional constraints: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(8,9)>0$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,7)>0$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,2)>0$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(2,7)>0$$\end{document}$ , based on the true causal model. Note that we chose a small sample size of 100 specifically to highlight the benefit of incorporating constraints, which is a common practice in studies that consider prior knowledge. The true causal model and the learned models without interventional constraints (i.e., after Stage One) and with interventional constraints (i.e., after Stage Two), are shown in Fig. 1, and the performance metrics (see Sect. 5.1 for details) of the learned models are summarized in Table 2 (better metrics are shown in bold and blue).Fig. 1. From left to right: a true causal model, b causal model learned without interventional constraints, c causal model learned with interventional constraints

Table 2. Performance metrics of the causal models learned with and without interventional constraintsMetricWithout interventional constraintsWith interventional constraintsFDR0.143 0.133 TPR0.706 0.765 FPR 0.071

0.071 SHD6 5 SID9 7 NNZ14 15 Time 3.01 14.59Bold value indicates statistically significant

In the causal model learned by NOTEARS without interventional constraints (i.e., from Stage One), we observe $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(8,9)=1.578$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,7)=0$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,2)=0$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(2,7)=0.563$$\end{document}$ . As the causal effects from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{3}$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{7}$$\end{document}$ and from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{3}$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{2}$$\end{document}$ are zero, the conditions $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,7)>0$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,2)>0$$\end{document}$ are violated. In contrast, the model learned with interventional constraints (i.e., from Stage Two) yields $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(8,9)=1.583$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,7)=0.195$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(3,2)=0.343$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(2,7)=0.570$$\end{document}$ , satisfying all required conditions. Notably, incorporating interventional constraints: (a) correctly recovered the causal paths from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{3}$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{7}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{3}$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{2}$$\end{document}$ , and (b) adjusted their causal effects from zero to positive. These results demonstrate that interventional constraints influence both the structural and parametric aspects of causal discovery.

Experiments

Performance Metrics and Baseline Methods

We conducted experiments on both synthetic and real-world datasets.1 All experiments were conducted on a laptop running Windows 11 Home (version 22H2, build 22631), equipped with a 13th Gen Intel^®^ Core™ i9-13900 H processor (14 cores, 20 threads, 2.6 GHz), 32 GB of RAM, and a 1 TB SSD. To evaluate the learned causal models, we consider metrics including false discovery rate (FDR), true positive rate (TPR), false positive rate (FPR), structural hamming distance (SHD) (Tsamardinos et al., 2006), structural intervention distance (SID) (Peters & Bühlmann, 2015), the number of non-zero entries (NNZ), and time (in seconds). The definitions of these evaluation metrics are summarised as follows:

FDR: the proportion of predicted edges that do not exist in the ground truth.
TPR: the fraction of true edges correctly recovered by the learnt model.
FPR: the proportion of non-existent edges incorrectly inferred as present.
SHD: the number of edge additions, deletions, or reversals needed to transform the estimated graph into the true graph.
SID: the number of intervention distributions incorrectly represented by the learnt structure.
NNZ: the number of non-zero entries in the estimated weight matrix after thresholding, reflecting the sparsity level of the inferred graph. For the above metrics, lower is better, except for TPR, for which higher is better. In addition to the previously introduced metrics, we assess the estimated matrix by comparing the signs of its elements with those of the true weight matrix. This measure is referred to as the sign consistency sum (SCS). Specifically, let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {est}}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {true}}$$\end{document}$ be the estimated and true weight matrices, both of dimension $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d \times d$$\end{document}$ . The SCS is defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {SCS}(W_{\text {est}}, W_{\text {true}}) = \sum _{i=1}^d \sum _{j=1}^d \textbf{1}_{\{\operatorname {sgn}(W_{\text {est}, ij}) = \operatorname {sgn}(W_{\text {true}, ij})\}}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\operatorname {sgn}(x)$$\end{document}$ is the sign function, defined as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\operatorname {sgn}(x) = 1$$\end{document}$ if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x> 0$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\operatorname {sgn}(x) = 0$$\end{document}$ if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x = 0$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\operatorname {sgn}(x) = -1$$\end{document}$ if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x < 0$$\end{document}$ . SCS ranges from 0 to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^2$$\end{document}$ , which is the number of elements in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {true}}$$\end{document}$ or $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {est}}$$\end{document}$ . A high SCS indicates that the positive and negative influences between variables are accurately captured, preserving the nature of causality - whether one variable increases (or decreases) as a result of another. This is particularly important in domains such as gene regulatory networks, where the sign of causal influence (activation or inhibition) can determine the behavior of complex biological systems. Thus, a high SCS enhances the trustworthiness of the model in practical applications, making it a critical metric for assessing the quality of causal inferences. Since no existing method supports the newly introduced interventional constraints, we demonstrate their value by comparing causal models learned with and without these constraints. We also compare with causal models learned with structural path constraints. For continuous optimization-based causal discovery, path constraints can be represented using the reachability matrix,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} R = \left( I + \frac{\tanh (W)}{d}\right) ^d, \end{aligned}$$\end{document}

where d denotes the number of variables. When $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=1$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{ij}>0$$\end{document}$ indicates direct reachability between variable pairs i and j, i.e., edge constraints. In contrast, when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d>1$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{ij}>0$$\end{document}$ indicates indirect reachability between variable pairs i and j, i.e., path constraints. See Appendix C for further analysis of the properties of R. To illustrate the difference between path constraints measured by R and interventional constraints by T, consider the case where variable i has a negative causal effect on variable j, the corresponding interventional constraint is given by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}<0$$\end{document}$ , while the associated path constraint is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{ij}>0$$\end{document}$ . Linear causal discovery with path constraints (Lin-CD-Path) is optimized using our two-stage procedure, except that the metric $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}, i \in {\mathcal {C}}, j \in {\mathcal {T}}$$\end{document}$ is replaced with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{ij}, i \in {\mathcal {C}}, j \in {\mathcal {T}}$$\end{document}$ . The details of the Lin-CD-Path algorithm are summarized in Algorithm 4 in Appendix C. Thus, in summary, we compare the performance of three methods: (A) NOTEARS that does not incorporate any constraints, including path or interventional constraints; (B) Lin-CD-Path that incorporates causal path constraints; and (C) Lin-CDIC method that incorporates interventional constraints. By contrasting the learned models from (A), (B), and (C), we aim to highlight the unique benefits of incorporating interventional constraints into causal discovery. For all methods, the threshold is set to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\omega =0.3$$\end{document}$ , consistent with other continuous optimization approaches (Zheng et al., 2018).

Synthetic Experiments

We generate random linear causal models characterized by scale-free (SF) graphs (Broido & Clauset, 2019) with Gaussian noise. The number of causal edges is randomly selected between eight and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\min \left( \left\lfloor \frac{d(d - 1)}{2} \right\rfloor , 10\right)$$\end{document}$ , where d denotes the number of nodes. As for the interventional constraints, we sample from the true causal model based on the strength of the causal effects between cause and target variables. A causal effect from variable i to j, denoted as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T_{ij}$$\end{document}$ , is considered significant if $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|T_{ij}|> 0.1$$\end{document}$ and is likely to be sampled. The above definition has real-world implications in fields such as genomics, econometrics, and systems biology. For example, weak causal effects are often seen as potentially spurious connections.

Effect of Sample Size under Fixed Constraints

Setting: Firstly, to explore the impact of varying data sizes on constraint satisfaction, we conduct experiments under a fixed number of interventional constraints. In these experiments with 20 variables, the number of constraints was set to two, and the data sizes were varied as 50, 100, 150, and 200. For each setting, we ran 20 experiments. The performance of three methods is shown in Table 3. Better metrics are shown in bold and blue. Note that, the sample sizes were deliberately kept small, with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \in \{50, 100, 150, 200\}$$\end{document}$ , motivated by recent research such as Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling (Zhu et al., 2023). This work provides a theoretical analysis of sample complexity bounds in causal discovery and shows that, for causal models with low nonlinearity (quantified by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_m$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_m = 0$$\end{document}$ corresponds to linear models), the SHD between the learned and true causal models decreases significantly as the sample size increases. Intuitively, Table 2 in Zhu et al. (2023) highlights the relationship between sample complexity and model size for causal models with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_m = 1$$\end{document}$ and 10 variables. This setting corresponds to causal models that are nearly linear, showing that the mean SHD drops from 32 to 13 as the sample size increases from 5 to 160. These insights, derived from simulations of causal discovery without interventional constraints, justify our use of low sample sizes to evaluate the effectiveness of our proposed method.Table 3. Performance metrics across sample sizes (Mean ± Variance)MethodsMetrics $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=50$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=100$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=150$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=200$$\end{document}$ NOTEARS (without constraints)FDR(0.113, 0.013)(0.025, 0.002)(0.037, 0.004)(0.030, 0.002)TPR(0.892, 0.005)(0.883, 0.004)(0.898, 0.001)(0.886, 0.002)FPR(0.009, 0.000)(0.002, 0.000)(0.002, 0.000)(0.002, 0.000)SHD(2.700, 4.410)(1.650, 0.728)(1.350, 0.328)(1.600, 0.940)SID(6.200, 44.460)(3.350, 5.528)(3.550, 2.848)(3.450, 8.648)SCS7,9397,9657,9667,961NNZ(13.700, 14.910)(12.500, 14.550)(12.100, 6.590)(12.700, 10.710)Time6.24.12.7****3.5Lin-CD-Path (with path constraints)FDR(0.124, 0.018)(0.046, 0.005)(0.049, 0.003)(0.045, 0.003)TPR(0.937, 0.007)(0.947, 0.006)(0.948, 0.002)(0.939, 0.003)FPR(0.011, 0.000)(0.003, 0.000)(0.003, 0.000)(0.003, 0.000)SHD(2.450, 7.048)(1.150, 1.928)(1.150, 1.028)(1.150, 1.628)SID(4.450, 32.348)(1.250, 3.888)(1.450, 2.848)(2.200, 7.660)SCS7,9447,9767,9757,971NNZ(14.550, 15.448)(13.550, 13.448)(12.950, 7.548)(13.600, 10.440)Time220.3209.9352.4276.2Lin-CDIC (with interventional constraints)FDR(0.094, 0.013)(0.032, 0.004)(0.016, 0.002)(0.021, 0.001)TPR(0.959, 0.007)(0.959, 0.004)(0.971, 0.002)(0.957, 0.003)FPR(0.008, 0.000)(0.002, 0.000)(0.001, 0.000)(0.001, 0.000)SHD(1.800, 5.360)(0.850, 1.428)(0.550, 0.748)(0.750, 1.088)SID(2.900, 11.890)(0.850, 1.528)(0.800, 2.060)(1.450, 4.748)SCS7,9607,9827,988****7,981NNZ(14.350, 14.628)(13.550, 16.050)(12.800, 7.460)(13.500, 9.750)Time622.3300.6553.5425.0The mean and variance of the edge numbers in the generated causal models, i.e. NNZ, for the four settings are (13.55, 15.25), (13.05, 12.25), (13.20, 8.66), and (13.85, 11.03), respectively, Bold value indicates statistically significant

Analysis: From Table 3, we observe a general trend across all methods: as the sample size increases (with the number of constraints remaining fixed), FDR, FPR, SHD, and SID tend to decrease, while TPR and SCS increase. This indicates the benefit of larger sample sizes for improving causal discovery performance. Lin-CDIC consistently achieves superior results across all metrics. Notably, its SID values-which evaluate the model from a downstream causal inference perspective-are significantly lower than those of the baselines, highlighting the advantages of incorporating interventional constraints. Furthermore, the SCS metric of Lin-CDIC, which reflects the number of correctly recovered signs of causal effects between variables, is higher than that of the baselines, even when only two interventional constraints are used. In contrast, NOTEARS exhibits higher FDR and SHD, particularly when the sample size is small (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 50$$\end{document}$ ), and while Lin-CD-Path provides moderate improvements, it does not match the performance of Lin-CDIC. This may be due to the fact that path constraints are generally less informative than interventional constraints for recovering causal models. In terms of time consumption, NOTEARS is significantly more efficient than both Lin-CD-Path and Lin-CDIC, as it is implemented using efficient L-BFGS, which only enforces acyclicity constraints. In contrast, Lin-CD-Path and Lin-CDIC employ more complex SLSQP optimization to handle additional path and interventional constraints. Since path constraints are generally less restrictive than interventional constraints, Lin-CD-Path is consequently more efficient than Lin-CDIC. Note that, since the number of constraints is fixed and the sample size only varies between 50 and 200, the time consumption of each method remains relatively stable, as expected.

Remark: For experiments with 20 variables, the number of elements in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {true}}$$\end{document}$ or $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{\text {est}}$$\end{document}$ is 400. Therefore, the maximum possible SCS value across 20 experiments is 8,000. Since the differences after averaging are relatively small, we report the total SCS summed over all 20 experiments. As shown, with two interventional constraints, the causal models learned by Lin-CDIC achieve approximately 20 more correctly signed causal effects than those learned by NOTEARS, and about 10 more than those learned by Lin-CD-Path. This highlights the benefit of incorporating interventional constraints, which contribute not only to structural regularization but also to parameter refinement.

Effect of Constraints under Fixed Sample Size

Setting: To further demonstrate the impact of increasing the number of interventional constraints, we conducted experiments with a fixed amount of data while varying the number of interventional constraints. Specifically, we tested models with 20 variables and a sample size of 100, varying the number of interventional constraints from one to four. Note that the sample size was set to 100 to highlight the benefit of incorporating constraints. The number of constraints was limited to four, as, on one hand, eliciting a large number of constraints is often impractical, and on the other hand, our Lin-CDIC method becomes significantly more time-consuming as the number of constraints increases. For each setting, we ran 20 experiments. The results are shown in Table 4. Better metrics are shown in bold. Note that for each constraint size setting, the generated causal models differ, as increasing the number of constraints may invalidate models that satisfied fewer constraints at lower settings.Table 4. Performance metrics across constraint sizes (Mean ± Variance)MethodsMetrics $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=1$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=2$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=3$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=4$$\end{document}$ NOTEARS (without constraints)FDR(0.028, 0.003)(0.025, 0.002)(0.038, 0.003)(0.037, 0.003)TPR(0.869, 0.007)(0.883, 0.004)(0.880, 0.004)(0.878, 0.004)FPR(0.002, 0.000)(0.002, 0.000)(0.003, 0.000)(0.003, 0.000)SHD(1.550, 0.747)(1.650, 0.728)(1.950, 2.048)(1.800, 1.760)SID(3.800, 20.160)(3.350, 5.528)(3.850, 16.428)(4.100, 14.590)SCR7,9657,9657,9577,959NNZ(11.350, 15.928)(12.500, 14.550)(12.750, 16.488)(12.400, 18.040)Time4.94.14.7****6.2Lin-CD-Path (with path constraints)FDR(0.069, 0.007)(0.046, 0.005)(0.019, 0.001)(0.041, 0.002)TPR(0.934, 0.008)(0.947, 0.006)(0.959, 0.004)(0.937, 0.004)FPR(0.005, 0.000)(0.003, 0.000)(0.002, 0.000)(0.003, 0.000)SHD(1.350, 2.428)(1.150, 1.928)(0.950, 2.048)(1.450, 2.348)SID(2.300, 18.910)(1.250, 3.888)(1.150, 4.728)(1.550, 5.648)SCS7,9697,9767,9817,971NNZ(12.550, 14.648)(13.550, 13.448)13.450, 14.048)(13.100, 15.490)Time144.4209.9276.6406.3Lin-CDIC (with interventional constraints)FDR(0.036, 0.006)(0.032, 0.004)(0.014, 0.001)(0.012, 0.001)TPR(0.956, 0.005)(0.959, 0.004)(0.966, 0.003)(0.959, 0.003)FPR(0.003, 0.000)(0.002, 0.000)(0.001, 0.000)(0.001, 0.000)SHD(0.850, 1.528)(0.850, 1.428)(0.700, 1.210)(0.700, 1.010)SID(1.800, 17.460)(0.850, 1.528)(0.700, 1.510)(0.750, 1.488)SCS7,9807,9827,986****7,986NNZ(12.500, 15.750)(13.550, 16.050)(13.500, 14.650)(13.050, 16.348)Time263.2300.6351.8506.9The mean and variance of the edge numbers in the generated causal models, i.e. NNZ, for the four settings are (12.50, 13.75), (13.05, 12.25), (13.60, 16.74), and (13.20, 15.46), respectively, Bold value indicates statistically significant

Analysis: From Table 4, we can conclude that Lin-CDIC consistently achieves the best overall accuracy across nearly all constraint sizes, except when only a single constraint is applied-where the constraining effect is minimal. It achieves the lowest SHD and SID, along with the highest TPR and SCS in each setting, indicating superior recovery of the true causal model. In terms of time consumption, NOTEARS is significantly more efficient than both Lin-CD-Path and Lin-CDIC. Moreover, while NOTEARS remains largely unaffected by the number of constraints, both Lin-CD-Path and Lin-CDIC exhibit a clear increase in runtime as the number of constraints grows. This observation is consistent with the theoretical time complexity analysis presented in Sect. 4.4, which suggests that Lin-CD-Path and Lin-CDIC become more computationally expensive when more constraints are incorporated.

Remark: The constrained problem presented in this paper includes both nonlinear equality constraints that enforce DAG-ness and nonlinear inequality or bound constraints that restrict reachability and the negativity of causal effects between variables. Optimizing such a problem with many constraints is particularly challenging. In our experiments, we observed that standard optimization methods, such as L-BFGS, are inadequate, leading us to adopt sequential least squares programming (SLSQP), which can handle general constraints. As the defined optimization problem is non-convex (see analysis in Sect. 4.1), solving it is computationally demanding (see time complexity in Sect. 4.4). Moreover, since the problem is non-convex, there is no guarantee of finding the globally optimal solution. Consequently, the scalability of our method is limited. Through this work, we aim to inspire further efforts to address the scalability challenges associated with our method. For instance, developing new optimization techniques specifically tailored to interventional constraints could significantly enhance both the scalability and efficiency of our approach.

Real-world Experiment

In addition to synthetic experiments, we also test on the widely used Sachs dataset (Sachs et al., 2005), which contains both observational and experimental flow cytometry data on protein signaling in human immune cells. Although this is a single dataset, it remains one of the most comprehensive benchmarks for evaluating causal discovery methods. We employ the Sachs causal graph, shown in Fig. 2, and available at https://www.bnlearn.com/research/sachs05/, which contains 20 causal edges, as a benchmark, despite controversies arising from uncertainties in intervention specificity, potential cyclic dependencies in cellular signaling networks, unmeasured confounding that challenges causal sufficiency, and discrepancies between the consensus network and the observed experimental data (Mooij & Heskes, 2013; Mooij et al., 2020; Schmidt & Murphy, 2009).Fig. 2. True Sachs causal graph

Sachs Causal Interactions Discussion

As Fig. 2 only indicates causal pathways between proteins without specifying particular causal interactions, such as inhibition or activation, we augmented the Sachs dataset with causal interactions from the literature and knowledge bases like Reactome (https://reactome.org/). Among a subset of the 11 phosphorylated proteins and phospholipids, we collected and discussed eight known causal interactions, as detailed below:

“PKC activate JNK”: In Lopez-Bergami and Ronai (2008), the Abstract states: “PKC can augment the degree of JNK activation by phosphorylating JNK...”; the Results section notes: “To achieve a more efficient activation of JNK, phosphorylation by PKC should precede phosphorylation by MKK4 or MKK7.”; and the Discussion adds: “Our data showed that phosphorylation by PKC enhances JNK activation by increasing MKK4/7-dependent phosphorylation.” Therefore, we can conclude that “PKC may indirectly activate JNK,” which can be expressed as our interventional constraint: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {JNK})> 0$$\end{document}$ .
“PKC activate P38”: In Yacoub et al. (2006), the Results section notes: “Thus, it appears that the MEK/ERK and p38 signaling pathways are important downstream effectors of PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta$$\end{document}$ in platelets.” The Discussion section adds: “We demonstrated that MEK1/2, ERK1/2, and p38 are activated by collagen and thrombin, and more importantly, established the requirement for PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta$$\end{document}$ and PLC activation in this process.” Finally, the Conclusion summarizes: “PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta$$\end{document}$ then triggers activation of the MEK/ERK and p38 signaling pathways, which ultimately result in the generation and release of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {TxA}_2$$\end{document}$ .” In Nakajima et al. (2004), the Abstract also notes: “PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}$ was found to be requisite for the activation of p38MAPK in LPS-stimulated microglia.” Therefore, we can conclude that “PKC may indirectly activate P38,” which can be expressed as our interventional constraint: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {P38})> 0$$\end{document}$ .
“PIP3 activates Akt”: In Manning and Cantley (2007), it is noted that “PI3K phosphorylates phosphatidylinositol-4,5-bisphosphate (PIP2) to generate phosphatidylinositol-3,4,5-trisphosphate (PIP3), in a reaction that can be reversed by the PIP3 phosphatase PTEN. AKT and PDK1 bind to PIP3 at the plasma membrane, and PDK1 phosphorylates the activation loop of AKT at T308,” a finding also acknowledged at https://reactome.org/content/detail/R-HSA-1257604 (Fabregat et al., 2018). However, Kearney et al. (2021) further suggest that Akt may indirectly inhibit additional PIP3 synthesis through feedback, indicating the presence of a feedback loop between PIP3 and Akt. In our paper, we study causal discovery under the assumption of a Directed Acyclic Graph (DAG), which means that “PIP3 activates Akt” and “Akt inhibits PIP3” cannot be incorporated simultaneously. Nevertheless, we can at least conclude that “PIP3 activates Akt,” which can be formalised as our interventional constraint: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ .
“PKA inhibit P38”: In Metz et al. (2021), the Results section states, “These results suggest that PKA inhibition in the PA/PDE4/PKA pathway activates p38.” The Discussion further explains, “We find that decreasing the basal PKA activity through the PA/PDE4/PKA pathway or using direct PKA inhibitors results in p38 and ERK1/2 activation. PKA activity seems then to exert a negative regulation upon p38 and ERK1/2 involved in EGFR endocytosis, which would be released when the PA/PDE4/PKA pathway is stimulated with propranolol.” Therefore, we can conclude that “PKA may indirectly inhibit P38,” which can be expressed as our interventional constraint: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38}) < 0$$\end{document}$ .
“PKA inhibit Raf”: Häfner et al. (1994) and Dumaz and Marais (2003) consistently report that “When PKA is activated, it phosphorylates Raf-1 and stimulates recruitment of 14-3-3, preventing Raf-1 recruitment to the plasma membrane and subsequently blocking its activation,” and “We also show that endogenous Raf-1 and PKA form a complex that is disrupted when cAMP levels in cells are elevated, and...the PKA inhibitor H89 rescues Raf-1 activation in the presence of forskolin/IBMX.” In addition, they state that “PKA can inhibit Raf-1 function directly via phosphorylation of the Raf-1 kinase domain.” Therefore, we can conclude that “PKA may directly inhibit Raf,” which can be expressed as our interventional constraint: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf}) < 0$$\end{document}$ .
“Raf activates MEK, MEK activates ERK, and Raf activates ERK,”: Roberts and Der (2007) report that “Raf kinases phosphorylate and activate the MEK1 and MEK2 dual-specificity protein kinases,” and “MEK1/2 then phosphorylate and activate the ERK1 and ERK2 MAPKs.” They further note that “Activated ERKs phosphorylate and regulate the activities of an ever-growing roster of substrates...” Based on this cascade, we conclude that “Raf activates MEK, MEK activates ERK, and thus Raf may indirectly activate ERK,” which can be formalised as the following interventional constraints: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {MEK})> 0$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {MEK}, \text {ERK})> 0$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {ERK})> 0$$\end{document}$ . The eight causal interactions and their corresponding interventional constraints and path constraints are listed in Table 5. Note that causal interactions between proteins and phospholipids may be either direct or indirect; our method supports both cases without distinction in the interventional constraints.Table 5. Causal interactions, interventional constraints, and path constraints in the Sachs datasetCausal interactionsInterventional constraintsPath constraintsPKC activates Jnk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {Jnk})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {PKC}, \text {Jnk})> 0$$\end{document}$ PKC activates P38 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {P38})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {PKC}, \text {P38})> 0$$\end{document}$ PIP3 activates Akt $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {PIP3}, \text {Akt})> 0$$\end{document}$ PKA inhibits P38 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38}) < 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {PKA}, \text {P38})> 0$$\end{document}$ PKA inhibits Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf}) < 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {PKA}, \text {Raf})> 0$$\end{document}$ Raf activates Erk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Erk})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {Raf}, \text {Erk})> 0$$\end{document}$ Raf activates Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Mek})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {Raf}, \text {Mek})> 0$$\end{document}$ Mek activates Erk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Mek}, \text {Erk})> 0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R(\text {Mek}, \text {Erk})> 0$$\end{document}$

Fig. 3. Sachs causal models learned by NOTEARS (without constraints) and Lin-CD-Path (with path constraints) Fig. 4. Sachs causal models learned by Lin-CDIC (with interventional constraints) under different $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values

Effectiveness Analysis

Setting: To demonstrate the effectiveness of interventional constraints, we use only the observational Sachs data ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n = 853$$\end{document}$ samples) along with three of the eight identified interventional constraints: “PKC activates Jnk,” “PKC activates P38,” and “PIP3 activates Akt,” reserving the remaining five for validation. Accordingly, for Lin-CD-Path method that incorporates path constraints, the corresponding path constraints are: “PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Jnk”, “PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38”, and “PIP3 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Akt”. The true causal graph and the causal models learned by NOTEARS (without constraints), Lin-CD-Path (with path constraints), and Lin-CDIC (with interventional constraints) for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 0.25$$\end{document}$ , 0.50, 0.75, and 1.0 are shown in Figs. 3 and 4. The total causal effects of variable pairs and the performance metrics of the learned models are presented in Table 6. Better metrics are shown in bold and blue. Note that in previous synthetic experiments, the signs of elements in the weight matrices are known, enabling evaluation of the learned models using the SCS metric. In contrast, for the real-world Sachs dataset, the signs and underlying cellular signalling mechanisms are only partially understood, making the SCS metric inapplicable for evaluation. Nevertheless, the signs in the learned model can still be verified against known causal interactions.

Analysis: From Table 6, we observe that the model learned by NOTEARS without constraints satisfies only one of eight interventional constraints, specifically “Raf activates Mek” with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Mek})=1.21$$\end{document}$ and two causal paths: PKA $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38, and PKA $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Raf. However, it fails to identify key interactions: “PKC activates Jnk”, “PKC activates P38”, and “PIP3 activates Akt”. It also fails to identify corresponding causal paths: PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Jnk, PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38, and PIP $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Akt. The model learned by Lin-CD-Path method incorporating path constraints shows improvement. Specifically, the causal interactions “Raf activates Mek” and “PIP3 activates Akt”, as well as the causal paths PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Jnk, PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38, PKA $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38 and PKA $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Raf, are recovered. However, it fails to recover the causal interactions “PKC activates Jnk”, “PKC activates P38”, “PKA inhibits P38” and “PKA inhibits Raf”, and instead incorrectly infers “PKC inhibits Jnk”, “PKC inhibits P38”, “PKA activates P38” and “PKA activates Raf”. The model learned by our Lin-CDIC method incorporating interventional constraints, shows significantly better performance. Specifically, it satisfies all three specified interventional constraints: “PKC activates Jnk”, “PKC activates P38”, and “PIP3 activates Akt”, in addition to “Raf activates Mek”. Notably, it also uncovers a novel but unspecified causal interaction, “PKA inhibits P38” with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38})=-0.4$$\end{document}$ , which means that it revealed two additional causal interactions: “Raf activates Mek” and “PKA inhibits P38”. This suggests that leveraging partial interactions allows our method to successfully identify new and correct causal interactions. Additionally, our method also recovers causal pathways: PKA $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Raf, Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Erk, and Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Erk. However, the causal effects $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Erk})=-0.02$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Mek}, \text {Erk})=-0.01$$\end{document}$ indicate weak negative causal effects, slightly violating the unspecified interactions, “Raf activates Erk” and “Mek activates Erk”. Furthermore, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf})=0.589$$\end{document}$ contradicts the expected interaction, as PKA is expected to inhibit Raf. In experiments with different $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values, when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.50$$\end{document}$ , in addition to the three given interventional constraints, our method still successfully recovers two additional interactions: “Raf activates Mek” and “PKA inhibits P38.” Specifically, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38})$$\end{document}$ is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 4.32, indicating a stronger negative causal effect from PKA to P38 compared to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.40 when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.25$$\end{document}$ . However, when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.75$$\end{document}$ and 1.0, only “Raf activates Mek” is consistently recovered. The value of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38})$$\end{document}$ shifts to 3.93 and 7.42, respectively, suggesting “PKA activates P38,” which contradicts the true interaction. Despite this inconsistency, our method still recovers the causal path from PKA to P38. The discrepancy among the four $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ settings can likely be attributed to significant structural and parametric changes in the models caused by larger $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values. This observation aligns with our sensitivity analysis, where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.25$$\end{document}$ is found to be optimal among the four tested choices. In summary, given three interventional constraints/interactions, Lin-CDIC recovers two additional causal interactions (“Raf activates Mek” and “PKA inhibits P38”), and identifies five additional causal paths (PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Jnk, PKC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ P38, PIP $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Akt, Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Erk, and Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Erk). These findings suggest that interventional constraints are more effective than path constraints, as correctly identifying causal interactions requires determining both the correct path and the appropriate sign of the causal effect. Additionally, interventional constraints on local causal interactions can, to some extent, facilitate the broader identification of causal interactions or paths. In addition, the causal models learned by Lin-CDIC with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 0.25$$\end{document}$ , 0.50, and 0.75 contain 22 edges, aligning more closely with the benchmark causal graph in Fig. 2, which has 20 edges, than those learned by NOTEARS and Lin-CD-Path. It is worth noting that in the real-world Sachs dataset experiment, although the sample size of 853 is relatively larger than those in the synthetic experiments, the performance metrics-such as FDR, TPR, FPR, SHD, and SID-of causal models estimated with or without constraints remain suboptimal. This may be attributed to measurement errors, noise, and unobserved confounders inherent in real-world data, which often require larger sample sizes for reliable causal discovery. In such scenarios, incorporating domain knowledge, such as interventional constraints, becomes essential.Table 6. Total causal effects and evaluation metrics of the causal models learned without constraints, with path constraints, and with interventional constraints under different $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ Effect/metricsNOTEARSLin-CD-PathLin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.25$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.50$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.75$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}1.0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Mek})> 0$$\end{document}$ 1.211.211.291.211.22****1.19 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {Jnk})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.620.360.430.88****1.51 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {P38})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.621.161.411.33****2.39 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ 00.620.410.520.560.86 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38}) < 0$$\end{document}$ 0.962.28 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.40 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 4.323.937.42 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf}) < 0$$\end{document}$ 0.520.520.590.510.530.51 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Erk})> 0$$\end{document}$ 00 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.0200 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.00 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Mek}, \text {Erk})> 0$$\end{document}$ 00 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.01000FDR0.530.380.500.640.640.67TPR0.350.500.550.400.400.40FPR0.230.170.310.400.400.46SHD141115212023SID473831****313534NNZ151622222224Time (s)2154575509538499Bold value indicates statistically significant

Remark: Nonlinear models generally outperform linear models in causal discovery tasks. For example, on the Sachs dataset using purely observational data, nonlinear methods such as SCORE (Rolland et al., 2022) (SHD: 12, SID: 45), CAM (Bühlmann et al., 2014) (SHD: 12, SID: 55), DiffAN (Sanchez et al., 2023) (SHD: 13, SID: 56), and GraN-DAG (Lachapelle et al., 2020) (SHD: 13, SID: 47) have demonstrated superior performance, as reported by Sanchez et al. (2023). In contrast, linear models like NOTEARS and FGS tend to yield higher structural hamming distances ( Zheng et al. 2018 and Yu et al. 2019). Although our method assumes a linear causal model, the SID metric value of the learned causal model, achieved using only three interventional constraints, is much lower than that of causal models learned under a nonlinear assumption.

Robustness Analysis

Setting: We also conducted a robustness analysis of our Lin-CDIC method. Specifically, we re-learned the causal models under the following combinations of interventional constraints: (1) one incorrect (“PIP3 inhibits Akt”) and two correct (“PKC activates Jnk”, “PKC activates P38”); (2) two incorrect (“PIP3 inhibits Akt”, “PKC inhibits P38”) and one correct (“PKC activates Jnk”); and (3) three incorrect constraints (“PIP3 inhibits Akt”, “PKC inhibits P38”, and “PKC inhibits Jnk”). These results are compared with models learned by NOTEARS (without any constraints), Lin-CD-Path (with path constraints), and Lin-CDIC (with all correct interventional constraints). The total causal effects of variable pairs and the performance metrics of the learned models are presented in Table 7. Note that Lin-CD-Path is not affected by the signs of causal effects or the correctness of interventional constraints. For example, for Lin-CD-Path, both “PIP3 inhibits Akt” and “PIP3 activates Akt” imply the existence of a causal path from PIP3 to Akt, i.e., PIP3 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Akt.

Analysis: Table 7 shows that introducing incorrect interventional constraints or priors results in sparser learned causal models. For example, when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 0.25$$\end{document}$ , the NNZ metric decreases from 22 to 20, indicating that two causal paths are missing compared to the model trained with all correct interventional constraints. Moreover, the incorrect constraints negatively influence the correct ones. For instance, when the incorrect constraint ’PIP3 inhibits Akt’ is provided, the causal path from PKC to Jnk becomes significantly weaker (e.g., 0.00 and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.000), in contrast to the value of 0.36 obtained when all constraints are correct. This aligns with the earlier observation that incorporating incorrect constraints tends to produce sparser causal models. Among the models trained without constraints and with 0 to 3 correct interventional constraints, the combination of two incorrect and one correct constraint yields the best performance in terms of FDR, FPR, and SHD. This may be attributed to the relatively sparse model learned under that setting, as sparser models tend to exhibit fewer false edges. Interestingly, even when the signs of the interventional constraints are incorrect, they may still indicate correct causal paths, thereby improving structural metrics such as FDR, FPR, and SHD. This also highlights the effectiveness of our Lin-CDIC method in incorporating causal path priors, a topic that has been explored in prior work. In contrast, the model learned with all correct interventional constraints performs best on the SID metric, which evaluates the model from a downstream causal inference perspective. In addition, the causal models learned by Lin-CDIC contain between 16 and 22 edges, aligning more closely with the benchmark causal graph, which has 20 edges, than those learned by NOTEARS and Lin-CD-Path.Table 7. Total causal effects and evaluation metrics of the causal models learned without constraints, with path constraints, and with 0 to 3 correct interventional constraintsEffect/metricsNOTEARSLin-CD-PathLin-CDIC IC-3Lin-CDIC IC-2Lin-CDIC IC-1Lin-CDIC IC-0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Mek})> 0$$\end{document}$ 1.211.211.211.221.20****1.29 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {Jnk})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.62 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.000.370.000.36 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {P38})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.62 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.45 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.490.57****1.16 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ 00.62 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.66 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.43 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.430.41 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38}) < 0$$\end{document}$ 0.962.282.062.050.31 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.40 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf}) < 0$$\end{document}$ 0.520.520.510.520.520.59 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Erk})> 0$$\end{document}$ 00000 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.02 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Mek}, \text {Erk})> 0$$\end{document}$ 00000 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.01FDR0.530.380.600.380.440.50TPR0.350.500.400.500.450.55FPR0.230.170.340.170.200.31SHD141118111215SID473835384331NNZ151620161622Time (s)21547241493675575IC-n denotes interventional constraints containing n incorrect specifications. Bold values indicate total causal effects aligned with the ground truth or the best performance across metrics

Generalization Analysis

Setting: We further analyzed the generalization of our method by cross-validating the interventional constraints. Based on Table 5, there are $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\begin{array}{c}8\\ 3\end{array}}\right) = 56$$\end{document}$ possible combinations of training constraint sets. We performed causal discovery for each combination using the corresponding path and interventional constraints. The average total causal effects of variable pairs and evaluation metrics of the causal models learned without constraints, with path constraints, and with interventional constraints are presented in Table 8.Table 8. Average total causal effects and evaluation metrics of the learned causal models without constraints, with path constraints, and with interventional constraints ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon =0.25,0.50,0.75,1.0$$\end{document}$ )Effect/metricsNOTEARSLin-CD-PathLin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.25$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.50$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}0.75$$\end{document}$ Lin-CDIC $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon {=}1.0$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Mek})> 0$$\end{document}$ 1.211.171.251.261.42****1.24 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {Jnk})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.130.180.270.41****0.51 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKC}, \text {P38})> 0$$\end{document}$ 00.190.350.460.400.65 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PIP3}, \text {Akt})> 0$$\end{document}$ 0 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.120.140.200.20****0.37 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {P38}) < 0$$\end{document}$ 0.960.86 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.94 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 1.95 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 2.59 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 7.08 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {PKA}, \text {Raf}) < 0$$\end{document}$ 0.520.47 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.54 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 0.61 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 1.02 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}$ 1.51 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Raf}, \text {Erk})> 0$$\end{document}$ 00.180.400.200.430.66 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T(\text {Mek}, \text {Erk})> 0$$\end{document}$ 00.130.290.240.530.49FDR0.530.450.590.610.640.62TPR0.350.430.410.420.400.43FPR0.230.200.350.390.400.41SHD1413.1417.7519.1220.520.1SID4741.5942.7142.8042.041.3NNZ1515.5920.322.1622.122.9Time (s)21521028833706501Bold value indicates statistically significant

Analysis: Table 8 shows that, under three random constraints, the average total causal effects between variable pairs learned by our Lin-CDIC method remain consistent with previously established findings. In contrast, the results from Lin-CD-Path and NOTEARS align only partially, capturing a limited subset of known causal interactions. (1) In terms of the average metrics FDR, FPR, and SHD, the models learned by Lin-CDIC exhibit higher values compared to those learned by Lin-CD-Path and NOTEARS. This may be due to the higher density of the causal models produced by Lin-CDIC, which contain between 20.3 and 22.9 edges-denser than those from NOTEARS and Lin-CD-Path. Greater density can lead to more false positives, thereby increasing FDR, FPR, and SHD. (2) In terms of the average SID metric, the models learned by the Lin-CDIC method show slightly lower SID values at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 1.0$$\end{document}$ , and slightly higher values at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 0.25, 0.50,$$\end{document}$ and 0.75, compared to those learned by the Lin-CD-Path method. This variation may arise from uncertainties in the correctness of the assumed ground truth structure shown in Fig. 2. For instance, Kearney et al. (2021) suggest that Akt may indirectly inhibit further PIP3 synthesis through a feedback mechanism, implying a potential feedback loop between PIP3 and Akt, PIP3 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ Akt $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \dots \rightarrow$$\end{document}$ PIP3, an interaction not captured in the ground truth. (Sachs et al., 2009, p. 10) noted that the T-cell signaling pathway was believed to contain at least two feedback cycles-specifically, a longer loop Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Erk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Akt $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Raf and a shorter loop Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Erk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Raf. Brouillard et al. (2024) revisited the Sachs dataset in a comprehensive review of causal discovery benchmarks and updated the “ground truth” graph to include a prominent feedback loop Raf $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Mek $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Erk $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document}$ Raf (see their Fig. 7). However, due to the acyclicity assumption adopted in this paper, we do not use their graph as the benchmark. It is worth noting that (Brouillard et al., 2024, p. 34) also advocate evaluating not only structure recovery but also interventional predictions, which reinforces the motivation of our study. Regarding the SID metric, it quantifies the number of inconsistencies between two causal graphs by comparing their resulting post-intervention distributions $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P(Y \mid \text {do}(X))$$\end{document}$ under all possible single-variable interventions. Intuitively, it captures the number of mismatches in causal pathways between the graphs. For instance, if the causal model learned by Lin-CDIC includes a causal path that is absent in the benchmark graph (Fig. 2), it is considered one inconsistency in the SID computation, thereby increasing the SID value for Lin-CDIC. Consequently, if the assumed ground-truth structure is uncertain, the SID value becomes equally unreliable. By relaxing the acyclicity assumption, Lin-CDIC may therefore achieve better performance on the Sachs dataset (see Discussion). (3) In terms of time consumption, Lin-CDIC exhibits a clear decrease as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ increases. This trend can be attributed to the nature of updates during optimization: smaller $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values lead to more conservative changes in the causal model, requiring more iterations to satisfy the given constraints. In contrast, larger $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ values (e.g., $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 0.75$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon = 1.0$$\end{document}$ ) introduce more substantial updates, enabling the model to satisfy constraints more quickly. However, these larger updates may also risk underfitting or missing the optimal solution due to overly aggressive changes. Note that, due to the complexity of the optimization process, we did not conduct experiments using all interventional constraints. The primary reason is that when the number of interventional constraints exceeds five, Lin-CDIC often converges to a local optimum. We leave this limitation as an open direction for future research.

Discussion

We introduce interventional constraints, a novel causal knowledge concept, to enhance the accuracy and explainability of causal discovery. Empirical results show that these constraints not only enforce consistency with known findings but also uncover additional correct interactions and pathways. Future directions include: (1) Scalability remains a key challenge due to the high non-convexity and constraint burden. Future work will explore more efficient optimization strategies to support larger causal systems. (2) Extending interventional constraints to latent-variable SCMs. Recent work on nonlinear causal discovery with latent variables [e.g., Ni et al. (2025)] proposes multi-stage optimisation procedures combining structural estimation with variational autoencoders. Such frameworks offer a natural opportunity for integrating interventional constraints. Specifically, our constraints operate through the structural matrix W by imposing inequalities on total causal effects and could therefore be incorporated into the structural optimisation stages. This may help orient edges that are otherwise unidentifiable under hidden confounding. Extending interventional constraints to such models presents a good direction for future research. Another relevant direction is to integrate interventional constraints with differentiable algebraic equality constraints for ancestral, arid, and bow-free ADMGs (Bhattacharya et al., 2021). (3) Generalization to nonlinear models, where causal effect value depends on intervention values (Pearl, 2001) and may require neural network parameterizations (Xia et al., 2021). In these settings, optimizing path-specific effects calculated through nested functions can be challenging when multiple causal paths exist. (4) Incorporating interventional constraints into cyclic structural causal models (SCMs) (Bongers et al., 2021; Dai et al., 2024; Hyttinen et al., 2012; Mooij & Heskes, 2013; Mooij et al., 2020) to create a more comprehensive framework for causal discovery in dynamic systems, such as biological systems, improving the ability to handle feedback loops and cyclic dependencies in real-world settings. The formulation of total causal effects used in this work is particularly convenient in the acyclic setting, where W corresponds to a DAG and is therefore nilpotent, so the Neumann series terminates after at most $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d-1$$\end{document}$ terms. In cyclic structural equation models, by contrast, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I-W)^{-1}$$\end{document}$ is no longer guaranteed to exist or to be well-conditioned, and the definition of total effects depends on additional stability assumptions (e.g., equilibrium or dynamic interpretations). Extending interventional constraints to such stable cyclic SEMs would therefore require incorporating explicit stability conditions or a suitable notion of temporal interpretations of causation. We leave this as an interesting but challenging direction for future work. (5) Decomposing Total Effects into Direct and Indirect Components. To assess global satisfaction of interventional constraints, we use the total causal effect, which captures both direct and indirect influences. While this provides a holistic measure, it may mask the contributions of specific causal pathways. Future work could enhance interpretability by explicitly separating direct and indirect effects. (6) Leveraging large language models (LLMs) to automatically extract high-level causal knowledge, enhancing scalability and explainability. While expert validation remains important (Griot et al., 2025), recent work demonstrates the potential of LLMs in guiding causal discovery (Ban et al., 2023; Liu et al., 2024; Long et al., 2023; Takayama et al., 2024; Vashishtha et al., 2023), making them a promising addition to our framework.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andrews, B., Spirtes, P., & Cooper, G. (2020). On the completeness of causal discovery in the presence of latent confounding with tiered background knowledge. In Chiappa, S., Calandra, R. (eds.), Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, vol. 108, pp. 4002–4011. Palermo, Sicily, Italy: PMLR.
2Bhattacharya, R., Nagarajan, T., Malinsky, D., & Shpitser, I. (2021). Differentiable causal discovery under unmeasured confounding. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, vol. 130, pp. 2314–2322. PMLR, Virtual.
3Brouillard, P., Lachapelle, S., Lacoste, A., Lacoste-Julien, S., & Drouin, A. (2020). Differentiable causal discovery from interventional data. In Proceedings of the 34th International Conference on Neural Information Processing Systems (pp. 21865–21877). Red Hook, NY: Curran Associates, Inc.
4Chickering, D.M., & Meek, C. (2002). Finding optimal bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI-2002) (pp. 94–102). San Francisco, CA: Morgan Kaufmann.
5Choo, D., Gouleakis, T., & Bhattacharyya, A. (2023). Active causal structure learning with advice. In Proceedings of the 40th International Conference on Machine Learning (pp. 5838–5867). Honolulu, Hawaii, United States: PMLR.
6Dai, H., Ng, I., Zheng, Y., Gao, Z., & Zhang, K. (2024). Local causal discovery with linear non-Gaussian cyclic models. In Banerjee, A., Fukumizu, K. (eds.), Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, vol. 238, pp. 154–162. Valencia, Spain: PMLR.
7Hasan, U., & Gani, M.O. (2022). KCRL: A prior knowledge based causal discovery framework with reinforcement learning. In Proceedings of the 7th Machine Learning for Healthcare Conference, vol. 193, pp. 691–714. Durham, NC, USA: PMLR.
8Inazumi, T., Shimizu, S., & Washio, T. (2010). Use of prior knowledge in a non-gaussian method for learning linear structural equation models. In Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation (pp. 221–228). Berlin, Heidelberg: Springer.