Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle
Marco Scutari

TL;DR
This paper critically examines Bayesian Dirichlet scores for learning Bayesian networks, linking them to the maximum relative entropy principle, and recommends the BDs score over BDeu for sparse data due to theoretical and empirical advantages.
Contribution
It reveals that BDeu violates the maximum relative entropy principle and is sensitive to hyperparameters, proposing BDs as a better alternative for sparse data structure learning.
Findings
BDeu violates the maximum relative entropy principle.
BDeu's Bayes factors are hyperparameter-sensitive.
BDs outperforms BDeu in sparse data scenarios.
Abstract
A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
