Learning the Dimensionality of Hidden Variables
Gal Elidan, Nir Friedman

TL;DR
This paper presents a score-based agglomerative clustering method to determine the number of states of hidden variables in Bayesian networks, improving model structure and generalization.
Contribution
It introduces a novel approach for estimating the cardinality of hidden variables in Bayesian networks using agglomerative clustering, handling multiple hidden variables.
Findings
Efficient evaluation of models with different hidden variable states.
Models with learned hidden variables generalize better.
Improved structure over previous methods.
Abstract
A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · Data Stream Mining Techniques
