Learning Bayesian Networks from Big Data with Greedy Search: Computational Complexity and Efficient Implementation
Marco Scutari, Claudia Vitolo, Allan Tucker

TL;DR
This paper analyzes the computational complexity of learning Bayesian networks from big data, proposing more efficient algorithms by leveraging closed-form estimators and predictive scores, validated on real-world datasets.
Contribution
It provides new complexity estimates for greedy search in Bayesian network learning and introduces practical improvements for big data applications.
Findings
Complexity estimates are more realistic under common assumptions.
Using closed-form estimators speeds up learning.
Predictive scores improve both speed and accuracy.
Abstract
Learning the structure of Bayesian networks from data is known to be a computationally challenging, NP-hard problem. The literature has long investigated how to perform structure learning from data containing large numbers of variables, following a general interest in high-dimensional applications ("small n, large p") in systems biology and genetics. More recently, data sets with large numbers of observations (the so-called "big data") have become increasingly common; and these data sets are not necessarily high-dimensional, sometimes having only a few tens of variables depending on the application. We revisit the computational complexity of Bayesian network structure learning in this setting, showing that the common choice of measuring it with the number of estimated local distributions leads to unrealistic time complexity estimates for the most common class of score-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
