Bayesian Structure Learning in Graphical Models using Shrinkage priors
Sayantan Banerjee

TL;DR
This paper introduces a Bayesian method for high-dimensional graphical model structure learning using a novel shrinkage prior, with theoretical guarantees and a Gibbs sampling scheme.
Contribution
It proposes the DL-graphical prior for precision matrix estimation and provides posterior convergence guarantees with a Gibbs sampling algorithm.
Findings
Effective structure learning in high-dimensional settings
Theoretical posterior convergence guarantees
Gibbs sampling scheme for practical implementation
Abstract
We consider the problem of learning the structure of a high dimensional precision matrix under sparsity assumptions. We propose to use a shrinkage prior, called the DL-graphical prior based on the Dirichlet-Laplace prior used for the Gaussian mean problem. A posterior sampling scheme based on Gibbs sampling is also provided along with theoretical guarantees of the method by obtaining the posterior convergence rate of the precision matrix.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Control Systems and Identification · Statistical Methods and Inference
Bayesian structure learning in graphical models using shrinkage priors 111This is an extended abstract version of the ongoing work.
Sayantan Banerjee
Indian Institute of Management Indore
1 Introduction
We consider the problem of learning the structure of an undirected graphical model corresponding to a -dimensional Gaussian random variable based on an iid sample of size , where can be much larger than . A Gaussian graphical model captures the conditional independence structure of the underlying random variable, with absence of an edge signifying that the corresponding components of the random variable are conditionally independent given the rest. Thus the sparsity structure of the graphical model is exactly given by the sparsity structure of the precision matrix (inverse covariance matrix) of the random variable.
Standard statistical procedures like the maximum likelihood estimator performs poorly or even does not exist in cases where the dimension is large. Regularized estiamtors or penalty based estimators have been proposed in this regard to tackle the high-dimensional situation under assumptions of sparsity. Bayesian techniques in this direction include putting sparse or spike and slab based priors on individual elements of the precision matrix.
In this work, we focus on learning the structure of a Gaussian graphical model through estimation of the precision matrix using continuous shrinkage priors. In the next section, we present the model assumptions along with specifying the prior distributions followed by evaluating the posterior distribution for the various parameters along with a sampling scheme for the same. We also establish some theoretical guarantees of our method by deriving the posterior convergence rates of the distribution of the precision matrix.
2 Model assumptions and prior distribution
Consider multivariate Gaussian data where is a -dimensional positive definite matrix. Let denote the corresponding inverse covariance matrix or the precision matrix. Here we consider a high-dimensional situation such that . Suppose the true precision matrix is sparse, that is, they belong to the following class of positive definite matrices:
[TABLE]
being the cone of positive definite matrices of dimension .
We propose the following prior distribution on the elements of .
[TABLE]
The above prior distribution is motivated by the Dirichlet-Laplace shrinkage priors introduced by Bhattacharya et al., (2015) for the sparse Gaussian mean problem. The above prior is a global-local shrinkage prior in the sense that the parameter induces a global shrinkage while offering deviations in shrinkage locally for individual parameters.
3 Posterior distribution and sampling scheme
In this section, we provide the posterior distribution of the precision matrix and devise a sampling scheme for the parameters. The conditional posterior density of is given by
[TABLE]
We partition the precision matrix as
[TABLE]
Also define where Then partition as
[TABLE]
Then, we have,
[TABLE]
where Let Then,
[TABLE]
where
So simulation of and can be done easily. For the rest of the parameters, we follow the same Gibbs sampler as proposed by BBhattacharya et al., (2015), that is, Simulate
[TABLE]
and then let
Simulate
[TABLE]
and then set
Finally, simulate
[TABLE]
where
In the above, denotes the inverse Gaussian distribution and denotes the generalised inverse Gaussian distribtion.
4 Posterior convergence rate
In this section, we establish some theoretical guarantees of our proposed method. In particular, we show that under certain sparsity assumptions, the posterior distribution of converges to the true precision matrix. We also derive the posterior convergence rates.
4.1 Estimating prior concentration
Following Bhattacharya et al., (2015), we have,
[TABLE]
for some constant Let us consider the set
[TABLE]
Following Banerjee and Ghosal, (2015), under assumptions on the eigenvalues of precision matrices being bounded away from [math] and infinity, we have,
[TABLE]
Now, we have, for the choice of
[TABLE]
Matching with the prior concentration rate gives,
[TABLE]
Here we need to check the rate , which comes out to be .
4.2 Choosing the sieve
The Dirichlet-Laplace prior is a shrinkage prior and does not set the value of any off-diagonal element of the precision matrix to be exactly zero. In this situation, we consider the sieve to be the space of all densities such that , where
[TABLE]
satisfies
[TABLE]
for suitably chosen threshold and each entry of is at most in absolute value.
Now, from Theorem 3.2 in Bhattacharya et al., (2015), we have, for and choice of , and for ,
[TABLE]
for some constant The above result will take care of a part (the size of the support mentioned above) of controlling the probability of the complement of the chosen sieve. For the other part (maximum absolute value of the elements), we can show that,
[TABLE]
where and are constants independent of . It follows that the rate obtained using the prior concentration matches the one obtained using the above metric entropy calculations.
The metric entropy using the sieve can be verified in similar lines with Banerjee and Ghosal, (2015), so as to get the posterior convergence rate as .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Banerjee and Ghosal, (2015) Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. Journal of Multivariate Analysis , 136:147–162.
- 2Bhattacharya et al., (2015) Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). Dirichlet–laplace priors for optimal shrinkage. Journal of the American Statistical Association , 110(512):1479–1490.
