Efficient Differentiable Discovery of Causal Order
Mathieu Chevalley, Arash Mehrjou, Patrick Schwab

TL;DR
This paper introduces a differentiable reformulation of the Intersort algorithm for causal discovery, enabling scalable, end-to-end gradient-based optimization of causal orderings in large datasets.
Contribution
It presents a novel differentiable approach to causal order discovery, overcoming computational limitations of previous score-based methods like Intersort.
Findings
Differentiable reformulation improves scalability.
Regularizing on causal order enhances causal discovery.
Method integrates seamlessly with gradient-based models.
Abstract
In the algorithm Intersort, Chevalley et al. (2024) proposed a score-based method to discover the causal order of variables in a Directed Acyclic Graph (DAG) model, leveraging interventional data to outperform existing methods. However, as a score-based method over the permutahedron, Intersort is computationally expensive and non-differentiable, limiting its ability to be utilised in problems involving large-scale datasets, such as those in genomics and climate models, or to be integrated into end-to-end gradient-based learning frameworks. We address this limitation by reformulating Intersort using differentiable sorting and ranking techniques. Our approach enables scalable and differentiable optimization of causal orderings, allowing the continuous score function to be incorporated as a regularizer in downstream tasks. Empirical results demonstrate that causal discovery algorithms…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The experimental evaluation is extensive. 2. The paper proposes a novel solution that significantly improves the scalability of previous methods.
1. The paper is hard to parse and is not self-contained: * The notation is unclear (for example, in line 128, C in the upper index is not introduced). * The assumptions, definitions, and theorems are informal and difficult to understand (for example, in assumption 2.3, what is a “detectable change”?; definitions 2.1 & 2.2 mix comments with the actual definitions; definition 2.7 is completely unclear to me) * Some algorithms and theorems, which constitute a part of the proposed solutions, are no
- The development of scalable approaches is of interest in the field of causal discovery. - Experiment shows the proposed method achieves efficient computation when the number of variables is large.
- The novelty is unclear to me, as the method resembles Annadani et al. (2023). - The diffIntersort score is not convex, which might result in converging to a local minimum. - The paper aims to improve computational efficiency with respect to the number of variables d. However, for enhancing the scalability of score-based causal discovery methods, it may be more important, in my view, to evaluate both accuracy and computational efficiency with respect to the dataset size instead. Especially, Int
1. The tech of a differentiable operator is novel to me and may provide insights into other fields. 2. The experiments are comprehensive.
1. This approach is very similar to DP-DAG (Charpentier et al. 2022) and BayesDAG (Annadani et al. 2023). DP-DAG is also permutation-based, differentiable (Sinkhorn), and supports intervening data. It is the approach's most direct and significant competitor. The lack of experimental comparisons with DP-DAG significantly reduces the persuasiveness of our empirical results, particularly in terms of scalability and accuracy. 2. Although the authors conduct experiments on RFF, GRN, and NN (non-lin
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Logic, Reasoning, and Knowledge
