Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data
Anthony C. Constantinou, Yang Liu, Kiattikun Chobtham, Zhigao Guo and, Neville K. Kitson

TL;DR
This study conducts a large-scale empirical validation of 15 Bayesian Network structure learning algorithms using noisy data to assess their real-world performance and compare different approaches.
Contribution
It introduces a methodology applying synthetic noise to evaluate algorithm robustness, providing the first large-scale validation under varied noisy data conditions.
Findings
Synthetic performance overestimates real-world accuracy by 10-50%.
Score-based methods generally outperform constraint-based methods.
Higher fitting scores do not guarantee more accurate causal graphs.
Abstract
Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is 'best'. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 structure learning algorithms. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
