CoLiDE: Concomitant Linear DAG Estimation

Seyed Saman Saboksayr; Gonzalo Mateos; Mariano Tepper

arXiv:2310.02895·cs.LG·March 14, 2024·1 cites

CoLiDE: Concomitant Linear DAG Estimation

Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper

PDF

Open Access 1 Repo 3 Reviews

TL;DR

CoLiDE introduces a convex score function for linear DAG learning that jointly estimates noise scales, improving robustness and efficiency over existing methods, especially in heteroscedastic noise scenarios.

Contribution

This work proposes a novel convex scoring method for linear DAG estimation that incorporates concomitant noise scale estimation, reducing parameter tuning and improving robustness.

Findings

01

Outperforms state-of-the-art methods in larger DAGs with heterogeneous noise.

02

Exhibits reduced standard deviations in domain-specific metrics, indicating increased stability.

03

Efficient gradient computation and closed-form noise variance estimation enhance scalability.

Abstract

We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the $unknown$ SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The paper is clearly written and the contributions are easy to digest. * The proposed score leads to structure improvements w.r.t. sota methods.

Weaknesses

* Significance: The paper considers only linear models, hindering the significance of the proposed loss function. * Novelty: The authors borrow ideas from concomitant lasso, and straightforwardly apply it to the score function for DAG learning. While it is totally okay with borrowing ideas from prior work, it feels that this is indeed the only technical contribution of the paper. The optimization part feels identical to prior work expect for the extra noise terms.

Reviewer 02Rating 8· accept, good paperConfidence 5

Strengths

1. The paper tackles an important problem of interest to the general ICLR community. 2. The proposed regularization is general enough that it can be plugged in many of the continuous optimization problems recently proposed for learning DAGs. The work's impact is hence potentially high as it could improve performance of many state-of-the-art methods. 3. The paper is generally well presented. Its claims are well supported by an extensive empirical analysis that illustrates the DAG recovery capa

Weaknesses

1. Although the adjacency matrix $W$ can be efficiently updated with stochastic gradient steps, the closed-form for the noise scale is evaluated on the full data because it is not decomposable. This makes the method scale poorly to big data. This limitation should be highlighted in the text or an efficient approximation could be discussed and empirically evaluated. 2. It is not clear how Problem 2 is obtained, i.e., under which assumptions the noise-dependent terms appear in the objective. It w

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

The problem considered is highly relevant because it is important to relax the assumption of equal error variances to handle heteroscedastic noises.

Weaknesses

The formulation (5) in the heteroscedastic setting lacks identification guarantee. It is unclear which specific settings it is theoretically correct for. For the linear Gaussian setting, one should use Gaussian likelihood, e.g., in GOLEM, while for linear non-Gaussian setting, one should use non-Gaussian likelihood, e.g., in NOTEARS-ICA. There are some possible issues with the experiments, elaborated in the next section.

Code & Models

Repositories

samiatto/colide
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks