Information gains from Monte Carlo Markov Chains
Ahmad Mehrabi, A. Ahmadi

TL;DR
This paper introduces a new numerical method and Python package for efficiently estimating relative entropy and expected relative entropy from MCMC samples, aiding model comparison and experiment design in cosmology.
Contribution
The paper presents a novel approach and software tool for computing information-theoretic quantities from MCMC chains, addressing computational challenges in non-Gaussian models.
Findings
Relative error below 0.2% for sample size > 10^5 in Gaussian models
Method is robust for estimating expected relative entropy
Provides a practical tool for cosmological data analysis
Abstract
In this paper, we present a novel method for computing the relative entropy as well as the expected relative entropy using an MCMC chain. The relative entropy from information theory can be used to quantify differences in posterior distributions of a pair of experiments. In cosmology, the relative entropy has been proposed as an interesting tool for model selection, experiment design, forecasting and measuring information gain from subsequent experiments. In contrast to Gaussian distributions, these quantities are not generally available analytically and one needs to use numerical methods to estimate them which are certainly computationally expensive. We propose a method and provide its python package to estimate the relative entropy as well as expected relative entropy from a posterior sample. We consider the linear Gaussian model to check the accuracy of our code. Our results indicate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Gaussian Processes and Bayesian Inference · Markov Chains and Monte Carlo Methods
Information gains from Monte Carlo Markov Chains
Ahmad Mehrabi
A. Ahmadi
Department of Physics, Bu-Ali Sina University, Hamedan, Iran
Abstract
In this paper, we present a novel method for computing the relative entropy as well as the expected relative entropy using an MCMC chain. The relative entropy from information theory can be used to quantify differences in posterior distributions of a pair of experiments. In cosmology, the relative entropy has been proposed as an interesting tool for model selection, experiment design, forecasting and measuring information gain from subsequent experiments. In contrast to Gaussian distributions, these quantities are not generally available analytically and one needs to use numerical methods to estimate them which are certainly computationally expensive. We propose a method and provide its python package to estimate the relative entropy as well as expected relative entropy from a posterior sample. We consider the linear Gaussian model to check the accuracy of our code. Our results indicate that the relative error is below for sample size larger than in the linear Gaussian model. In addition, we study the robustness of our code in estimating the expected relative entropy in this model.
I Introduction
In contrast to a few decades ago, there are a large number of probes in cosmology, which provide us remarkable information about content and evolution of the Universe. These data sets have been extensively used to study and constrain free model parameters in literature (see Mehrabi et al. (2015, 2017); Rezaei et al. (2017); Mehrabi (2018)and references therein). Bayesian inference provides a common and widely used method to constrain free model parameters. In this case, we update a prior probability density in parameter space to obtain posterior distribution using an observational data. Since an analytic solution in the Bayesian inference is very limited, one has to develop a numerical method to find the posterior. Among these, the Monte Carlo Markov Chain (MCMC) techniques are widely accepted and used in different problems. The purpose of an MCMC algorithm is to construct a sample of points in parameter space which is called a chain and then obtain posterior probability density from it. The simplest and widely used MCMC algorithm is Metropolis-Hasting Hastings (1970) but considering different situations other algorithms like Gibbs sampling T. and van Dyk D.A. (2009); Y. and X.L. (2011) and Hamilton Monte-Carlo Neal (2011) have been used to obtain the posterior distributions. To quantify the difference between probability distributions from different surveys, a robust framework is needed.
Initially motivated from information theory, the relative entropy or Kullback-Leibler divergence has been proposed to measure differences in two probability densities S. and A. (1951) . In addition, this method has been used for experiment design and forecasting Farhang et al. (2013); Paykari and Jaffe (2013); Amara and Refregier (2014) as well as model selection Kunz et al. (2006); Verde et al. (2013) in cosmology. Moreover, the relative entropy has been introduced as a new tool to measure information gain from successive experiment Seehars et al. (2014); Grandis et al. (2016) as well as a tool to measuring tensions among datasets within a given model Seehars et al. (2016); Nicola et al. (2019). The relative entropy quantifies both statistical precision and shifts of confidence regions and by disentangling these contributions, one can measure change of confidence regions and shifts of parameters from two different datasets. In the limit of Gaussian distributions, the relative entropy has an analytic solution but in a general case, one should use numerical method to obtain it. Since in most cases the probability distributions, coming from a MCMC chain, it would be a remarkable task to provide the relative entropy from an MCMC chain. In this work, we introduce a method and provide a python package to estimate the relative entropy from an MCMC chain.
Given two datasets and , it is straightforward to find the posterior probability distributions and then the relative entropy between these two distributions. As we mentioned above, the relative entropy consist of two contributions, information gain in precision and shifts in parameter space. To distinguish these two contributions, one can use the constrains from dataset to anticipate the expected relative entropy for dataset by assuming both datasets are described by the same model. The difference between the relative entropy and the expected one is called surprise and has been introduced in Seehars et al. (2014) as a remarkable tool to measure consistency between datasets in a given model. The expected relative entropy has an analytic solution in the case of two Gaussian distributions but for a general case, we need to use a numerical method to estimate it. To do this, many algorithms have been proposed in literature C. et al. (2013); Q. et al. (2013); X. and M. (2013). In this work, we introduce a python package to estimate the expected relative entropy base on the algorithm proposed in X. and M. (2013).
This work is organized as follows. In section II we review the formalism of relative entropy and present it for two Gaussian distributions. In section III, we argue about the surprise and its close-form in the Gaussian limit. In section IV, we present the linear Gaussian model and compare the exact results from those of our code to check its accuracy. Finally, in V, we conclude and highlight importance of our method.
II Information gain base on the relative entropy
The likelihood is the probability of the data given the value of the parameters and is a crucial quantity in parameter inference. Given a likelihood, it is straightforward to update a prior information on parameters to obtain the posterior through Bayes’ theorem:
[TABLE]
where is the likelihood function for the data and the denominator is the Bayesian evidence which is given by
[TABLE]
In this process, one can measure information gain from updating the prior to posterior via the relative entropy or Kullback-Leibler. The relative entropy between two probability distributions and is given by:
[TABLE]
The relative entropy is always positive and equals to zero only for . Apart from not being symmetric in and , the relative entropy is invariant under invertible transformations in .
For two Gaussian distributions and the relative entropy is given by:
[TABLE]
The first term measure the significance of mean shift and the second term quantifies change in precision. In a general case, the probability distributions are not Gaussian so developing a method to estimate the relative entropy for any arbitrary distributions, is a remarkable task. Assuming as a posterior and using Eq.(1), the relative entropy can be rewritten as:
[TABLE]
Given a sample of posterior, the second integral can be easily estimated from so knowing the evidence ,one can estimate the relative entropy from a sample for any arbitrary distributions. We provide a python package (available at https://github.com/ahmadiphy/MCKLdivergence) to estimate the relative entropy using Eq.(5). Inputs are a sample chain and at each sample in the chain. The code estimate the first term using method introduced in Heavens et al. (2017) using kth nearest-neighbour distances in parameter space.
III Expected relative entropy and surprise
Considering a prior and likelihood function, it is possible to find several realizations of data. Assuming as the probability of obtaining given , the expected relative entropy is given by
[TABLE]
where is given by:
[TABLE]
Notice that, it is possible to consider the prior from one dataset for example and likelihood from so the expected relative entropy can be estimated between two datasets. The surprise is defined via Seehars et al. (2014)
[TABLE]
which scatters around zero. A positive value of S indicates that posterior is more different that what we expect and a negative value means the constrains are more consistent than expected a priori.
It has been proved that S follows a generalized distribution Seehars et al. (2014) for Gaussian distributions and given a particular value of S, one can measure the probability for measuring S that deviates from zero by more than S. This quantity is the so called p-value for hypothesis that both datasets are consistent within the considered model and a small p-value indicates evidence against the hypothesis.
In the limit of two Gaussian posteriors, the surprise is given by
[TABLE]
where the - holds when posterior of is used as a prior to obtain posterior of and the + holds when a wide prior is used for both posteriors. Since in a general case, posteriors derived from and are not Gaussian, we need a general approach for obtaining surprise.
In the Bayesian experimental design, the expected relative entropy is a well known quantity. In fact many algorithms have been proposed to obtain this quantity in a general non-linear cases C. et al. (2013); Q. et al. (2013); X. and M. (2013). Among these, a simple method has been proposed in X. and M. (2013), where the expected relative entropy can be estimated from
[TABLE]
where the second term can be estimated from
[TABLE]
In the above formula, is a sample from posterior and is a sample of simulated data from likelihood. Given a sample of from posterior and the likelihood function of , it is possible to estimate the expected relative entropy and then the surprise from the above formula. We provide a class in our python package to estimate the expected relative entropy using above algorithm. In this case, inputs are a sample of from posterior, the covariance of the likelihood , the model function , number of sample to be used to estimate expected relative entropy and the dimension of the data . The code uses given sample value to simulate data from a Gaussian likelihood
[TABLE]
and then uses them to estimate the expected relative entropy from Eq.(10). Notice that the user defined model function should return a vector of dimension . The current version of the code adopt only a Gaussian likelihood and we plan to update it to a general case in subsequent updates.
IV Linear Gaussian model and accuracy of the method
In order to check the accuracy of our code, we consider the linear Gaussian model which has an analytic solution for both relative entropy and expected relative entropy. We consider a Gaussian prior on and a Gaussian likelihood on the data. The model function must be linear in , we assume
[TABLE]
where is a matrix evaluated at some arbitrary points and is a constant vector. In this formalism, are the basic functions which can very well be a non-linear function of . Notice that the model function must be linear in , not necessarily in the basic functions. Using Bayes’ theorem, the posterior is a Gaussian with the following covariance and mean
[TABLE]
Having both Gaussian prior and posterior, we can use Eq.(3) and Eq.(6) to compute the relative and expected relative entropy. After this, we generate a sample chain using Hamilton algorithm and compute these quantities by our code. The results in 3 dimensions have been shown in Figs (1 and 2).In the case of relative entropy, for and with samples, the relative error is around but it decreases for a larger k and is below for with samples. As we show in Fig (1), the relative entropy decreases like a power law by increasing size of the sample. The expected relative entropy also decreases like a power law by increasing size of samples and goes below with samples. Since the expected relative entropy is computationally expensive, the code provide a class to parallel all computations using the MPI.
Moreover, we repeat above computation for a 5 dimensions case to study how increasing dimension affects the results. The results in this case, are presented in Figs (3 and 4)
In contrast to 3D, different value of k gives almost the same results. In this case the relative error for samples is around which indicates robustness of our code. Similar to the 3D case, the expected relative entropy decreases like a power law and is around for samples. Note that, error in the expected relative entropy for 5D case is relatively larger than 3D with small number of samples.
V Conclusion
In this work, we introduce a novel method and provide its python package to compute relative and expected relative entropy. The relative entropy quantifies amount of information in updating from a prior to a posterior probability densities. Since in the most cases of Bayesian inference, we use an MCMC algorithm to generate a sample of posterior, our code use the chain information alongside the to estimate the relative entropy from the chain. For expected relative entropy, the relative error between the exact and estimated value in a linear Gaussian model are around and for samples in 3D and 5D respectively. Since there is no closed-form solution for the relative entropy in the case of an arbitrary probability distributions, The code would be useful to estimate amount of information gain in updating from a prior to posterior in a general case.
In addition to the relative entropy, there are some algorithms to estimate the expected relative entropy. The expected relative entropy has been used to define a quantity so called surprise. The surprise is a measurement of consistency between posterior distributions and can be used to quantify possible tension between data sets within a model. Given a sample of first posterior and likelihood function of the second data set, our code provides an estimation of expected relative entropy base on the algorithm presented in X. and M. (2013). Since the linear Gaussian model has an analytic solution, we compare the estimated value with the exact one in 3D and 5D to check the robustness of our code. The relative errors in this case are around and for samples in 3D and 5D respectively. The code is available in Github. Alongside the code, there are two examples for computing the relative entropy and the expected relative entropy in the case of linear Gaussian model.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Heavens et al. (2017) A. Heavens, Y. Fantaye, A. Mootoovaloo, H. Eggers, Z. Hosenie, S. Kroon, and E. Sellentin, (2017), ar Xiv:1704.03472 [stat.CO] .
- 2Mehrabi et al. (2015) A. Mehrabi, S. Basilakos, and F. Pace, Mon. Not. Roy. Astron. Soc. 452 , 2930 (2015) , ar Xiv:1504.01262 [astro-ph.CO] . · doi ↗
- 3Mehrabi et al. (2017) A. Mehrabi, F. Pace, M. Malekjani, and A. Del Popolo, Mon. Not. Roy. Astron. Soc. 465 , 2687 (2017) , ar Xiv:1608.07961 [astro-ph.CO] . · doi ↗
- 4Rezaei et al. (2017) M. Rezaei, M. Malekjani, S. Basilakos, A. Mehrabi, and D. F. Mota, Astrophys. J. 843 , 65 (2017) , ar Xiv:1706.02537 [astro-ph.CO] . · doi ↗
- 5Mehrabi (2018) A. Mehrabi, Phys. Rev. D 97 , 083522 (2018) , ar Xiv:1804.09886 [astro-ph.CO] . · doi ↗
- 6Hastings (1970) W. Hastings, Biometrika 57 , 109 (1970).
- 7T. and van Dyk D.A. (2009) P. T. and van Dyk D.A., Journal of Computational and Graphical Statistics 18 , 283 (2009).
- 8Y. and X.L. (2011) Y. Y. and M. X.L., Journal of Computational and Graphical Statistics 20 , 531 (2011).
