Optimal Kernel Choice for Score Function-based Causal Discovery
Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun, Zhang, Mingming Gong

TL;DR
This paper introduces an automatic kernel selection method for score-based causal discovery, improving accuracy by optimizing kernel choice through marginal likelihood maximization, and demonstrating superior performance over heuristic methods.
Contribution
It proposes a novel automatic kernel selection approach within the score function framework for causal discovery, replacing manual heuristic tuning.
Findings
Outperforms heuristic kernel selection methods in experiments
Effective on both synthetic and real-world data
Enhances causal discovery accuracy
Abstract
Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative…
Peer Reviews
Decision·ICML 2024 Poster
The hyper-parameter selection for the kernel-based causal discovery is obviously important, and the marginal likelihood based approach provides a reasonable criterion.
Differences from Huang et al. (2018) are somewhat weak. Some contents are similar.
This paper highlights the fundamental problem of kernel choices in the field of kernel regression as a part of score-based causal inference.
1. This paper is highly related to Huang et al. 2018, while the comparison is not sufficient, which dilutes the marginal contribution of this paper. Some of the comparisons are even misleading to some extent. See the Question part. 2. This paper states the criticality of choosing the right kernel. However, it keeps the original kernel choice as Huang et al. 2018 ("More specifically, we utilize the widely-used Gaussian kernel throughout the paper" on page 5). The lack of in-depth discussions on
- The proposed method is convincingly motivated from an information-theoretic point of view where the goal is to minimize the mutual information (MI) between parents of a variable and a noise term. - The proposed method outperforms other score-based causal discovery methods on synthetic and two real-world data sets. - The paper is easy to follow.
- The main contribution of the paper seems to be the introduction of a trainable kernel hyper-parameter (instead of previous work that estimates it heuristically from data) and an extended objective that minimizes the MI between parents of a node and noise variables. This seems too incremental as a conference contribution and in my (the reviewer's) opinion is of insufficient originality. - The paper is extremely close to Huang et al. (2018) [1], both in writing and content as well as methodologi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Rough Sets and Fuzzy Logic · Data Quality and Management
