Large-Sample Learning of Bayesian Networks is NP-Hard
David Maxwell Chickering, Christopher Meek, David Heckerman

TL;DR
This paper proves that learning Bayesian networks from data with large samples is computationally NP-hard, even under ideal oracle conditions and for networks with nodes having more than three parents.
Contribution
It establishes new complexity bounds for Bayesian network learning algorithms, showing NP-hardness in large-sample scenarios with various oracle assumptions.
Findings
High-scoring structure identification is NP-hard.
Results hold with independence, inference, and information oracles.
NP-hardness applies to networks with nodes having more than three parents.
Abstract
In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · Machine Learning and Algorithms
