Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets
Justin Domke

TL;DR
This paper introduces a new approach for maximum likelihood learning in high-treewidth graphical models by constraining parameters to a set of fast-mixing distributions, enabling efficient and theoretically guaranteed gradient approximation.
Contribution
It establishes that constraining parameters to fast-mixing sets allows for provably efficient MCMC-based gradient estimation in maximum likelihood learning.
Findings
Gradient descent with sampling approximates MLE within the fast-mixing set.
Effort is polynomial in 1/epsilon for unregularized solutions.
Regularization improves convergence to quadratic effort in 1/epsilon.
Abstract
Inference is typically intractable in high-treewidth undirected graphical models, making maximum likelihood learning a challenge. One way to overcome this is to restrict parameters to a tractable set, most typically the set of tree-structured parameters. This paper explores an alternative notion of a tractable set, namely a set of "fast-mixing parameters" where Markov chain Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the stationary distribution. While it is common in practice to approximate the likelihood gradient using samples obtained from MCMC, such procedures lack theoretical guarantees. This paper proves that for any exponential family with bounded sufficient statistics, (not just graphical models) when parameters are constrained to a fast-mixing set, gradient descent with gradients approximated by sampling will approximate the maximum likelihood solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Markov Chains and Monte Carlo Methods · Bayesian Modeling and Causal Inference
