Spectral Approximate Inference
Sejun Park, Eunho Yang, Se-Young Yun, Jinwoo Shin

TL;DR
This paper introduces a spectral-based approximation method for computing the partition function in graphical models, overcoming limitations of local iterative algorithms by leveraging global spectral features for improved robustness and accuracy.
Contribution
It presents a polynomial-time approximation scheme for low-rank GMs and a spectral mean-field scheme for high-rank GMs, enhancing robustness over prior methods.
Findings
The spectral approach outperforms prior algorithms in accuracy.
The method is robust and does not suffer from convergence issues.
Experiments demonstrate improved efficiency and reliability.
Abstract
Given a graphical model (GM), computing its partition function is the most essential inference task, but it is computationally intractable in general. To address the issue, iterative approximation algorithms exploring certain local structure/consistency of GM have been investigated as popular choices in practice. However, due to their local/iterative nature, they often output poor approximations or even do not converge, e.g., in low-temperature regimes (hard instances of large parameters). To overcome the limitation, we propose a novel approach utilizing the global spectral feature of GM. Our contribution is two-fold: (a) we first propose a fully polynomial-time approximation scheme (FPTAS) for approximating the partition function of GM associating with a low-rank coupling matrix; (b) for general high-rank GMs, we design a spectral mean-field scheme utilizing (a) as a subroutine, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications
Spectral Approximate Inference
Sejun Park
Eunho Yang
Se-Young Yun
Jinwoo Shin
Abstract
Given a graphical model (GM), computing its partition function is the most essential inference task, but it is computationally intractable in general. To address the issue, iterative approximation algorithms exploring certain local structure/consistency of GM have been investigated as popular choices in practice. However, due to their local/iterative nature, they often output poor approximations or even do not converge, e.g., in low-temperature regimes (hard instances of large parameters). To overcome the limitation, we propose a novel approach utilizing the global spectral feature of GM. Our contribution is two-fold: (a) we first propose a fully polynomial-time approximation scheme (FPTAS) for approximating the partition function of GM associating with a low-rank coupling matrix; (b) for general high-rank GMs, we design a spectral mean-field scheme utilizing (a) as a subroutine, where it approximates a high-rank GM into a product of rank-1 GMs for an efficient approximation of the partition function. The proposed algorithm is more robust in its running time and accuracy than prior methods, i.e., neither suffers from the convergence issue nor depends on hard local structures, as demonstrated in our experiments.
Machine Learning, ICML
1 Introduction
Graphical models (GMs) provide a succinct representation of a joint probability distribution over a set of random variables by encoding their conditional dependencies in graphical structures. GMs have been studied in various fields of machine learning, including computer vision (Freeman et al., 2000), speech recognition (Bilmes, 2004) and deep learning (Salakhutdinov & Larochelle, 2010). Most inference problems arising in GMs, e.g., obtaining desired samples and computing marginal distributions, can be easily reduced to computing their partition function (normalizing constant). However, computing the partition function is #P-hard in general even to approximate (Jerrum & Sinclair, 1993), which is thus a fundamental barrier for inference tasks of GM.
Variational inference is one of the most popular heuristics in practice for estimating the partition function. It is typically achieved via running iterative local message-passing algorithms, e.g., mean-field approximation (Parisi, 1988; Jain et al., 2018) and belief propagation (Pearl, 1982; Wainwright et al., 2005). Markov chain Monte Carlo (MCMC) method (Neal, 2001; Efthymiou et al., 2016) is another popular approach, where it usually samples from GMs via Markov chains with a local transition, e.g., Gibbs sampler (Geman & Geman, 1984), and estimates a target expectation by averaging over samples. Unfortunately, both variational and MCMC methods are hard to guarantee the convergence/mixing under some fixed computation budget and known to output poor approximation in the low-temperature regime, i.e., large parameters of GM, due to the non-existence of the so-called correlation decay (Weitz, 2006; Bandyopadhyay & Gamarnik, 2008). On the other hand, variable elimination (Dechter, 1999; Dechter & Rish, 2003; Liu & Ihler, 2011; Xue et al., 2016; Wrigley et al., 2017; Ahn et al., 2018a, b) is one of popular ‘convergence free’ methods for approximating the partition function. At each step, it sequentially marginalizes a chosen variable and generates complex high-order factors approximating the marginalized variable and its associated factors. Hence, it guarantees to terminate after marginalizing all variables. However, the performance of variable elimination schemes is also significantly degraded in the low-temperature regime, due to its local/iterative nature of processing variables one by one.
Contribution. In this paper, we propose a completely new approach by investigating the global information of GM, to overcome the limitation of prior methods. To this end, we study the spectral feature of the coupling matrix of GM and propose a partition function approximation algorithm utilizing the eigenvectors and eigenvalues. In particular, if the matrix-rank and parameters of GM are bounded, i.e., , then we prove that the proposed algorithm is a fully polynomial-time approximation scheme (FPTAS), even for GMs with high treewidth. Such polynomial-time approximation schemes have been typically investigated in the literature under certain structured GMs (Temperley & Fisher, 1961; Pearl, 1982; Dechter, 1999; Jerrum et al., 2004), and high-temperature regimes (Zhang et al., 2011; Li et al., 2013; Patel & Regts, 2017) or homogeneity of GM parameters (Jerrum & Sinclair, 1993; Sinclair et al., 2014; Molkaraie, 2016; Liu et al., 2017; Patel & Regts, 2017; Molkaraie & Gómez, 2018). Our theoretical result provides a new class of GMs for the direction.
Despite the theoretical value of the proposed algorithm for low-rank GMs, it is very expensive to run for general high-rank GMs as its complexity grows exponentially with respect to the rank. To address this issue, we decompose the partition function of high-rank GM into a product of those of rank-1 GMs. Then, we run the proposed FPTAS algorithm to compute all rank-1 partition functions and combine them to approximate the original partition function. For improving our approximation, we additionally suggest running a semi-definite programming to discover a better spectral decomposition of the partition function. In a sense, our approach is of mean-field type, but different from the traditional ones decomposing GM itself without spectral pre-processing. We present an illustration of the proposed scheme in Figure 1.
The proposed mean-field scheme can be universally applied to any GMs without the rank restriction. Its computational complexity scales well for large GMs without suffering from the convergence issue. Furthermore, its approximation quality is quite robust against hard GM instances of heterogeneous parameters since the utilized spectral feature grows linearly with respect to the inverse temperature, i.e., scale of parameters. Our experiments demonstrate that the proposed scheme indeed outperforms mean-field approximation, belief propagation and variable elimination, in particular, significantly in the low-temperature regimes where the prior methods fail.
2 Spectral Inference for Low-Rank GMs
We begin with introducing the definition of the pairwise binary graphical model (GM). Given a vector and a symmetric matrix , we define GM as the following joint distribution on :
[TABLE]
where denotes the inner product and is the normalizing constant. The above definition of GM coincides with the following conventional definition associating with an undirected graph defined as:
[TABLE]
where and .
The normalizing constant of (1) is called the partition function defined as follows:
[TABLE]
Computing is one of the most essential inference tasks arising in GMs. However, it is known to be computationally intractable in general, i.e., #P-hard even to approximate (Jerrum & Sinclair, 1993). In particular, the case when the magnitudes of entries of are large is called, the low-temperature regime (Sykes et al., 1965), where is known to be harder to approximate provably (Sly & Sun, 2012; Galanis et al., 2014). This is indeed the regime where known heuristics also fail badly.
In this section, we show that is possible to be approximated in polynomial-time if there exists a diagonal matrix such that the rank of is bounded, i.e., . Just for clarity, we primarily focus on the case when is of low-rank itself (i.e., ) and then describe at the end of this section how our results are extended to the case when is of low-rank for any diagonal matrix .
2.1 Overall Approach: Approximate Inference via Spectral Decomposition
To design such a polynomial-time algorithm, we first reformulate using the eigenvalues/eigenvectors of as follows:
[TABLE]
where and denote the -th largest non-zero eigenvalue and its corresponding unit eigenvector of and denotes the rank of . We note that such a decomposition is always possible because is a real symmetric matrix, i.e., all eigenvalues are real. However, even with a small rank , a naive computation of is still intractable as it is a summation over exponentially many terms. Our main idea is approximating in (4) to its quantized value in order to drastically reduce the number of summations. Toward this, we rewrite (4) as
[TABLE]
where denotes the sign of and . Here we deliberately choose some mapping (it will be explicitly described in Section 2.2) so that for some fixed constant and hence can be nicely approximated as
[TABLE]
Note that decides a quantization interval and represents a quantized value of . Namely, for each , we will design for approximating for all .
Given such , we further process (5) as
[TABLE]
In the above, the first equality is from replacing the summation over by that over , i.e., for , each represents a possible value of . For the second equality, we define t(\mathbf{k}):=\sum_{\mathbf{x}\in\mathbf{f}^{-1}(\mathbf{k})}\exp\big{(}\langle\boldsymbol{\theta},\mathbf{x}\rangle\big{)}. Finally, from (5) and (6), one can observe that if is easy to compute and the cardinality of is small, then the partition function can be efficiently approximated. In the following section, we provide more details on how to choose for the desired property.
2.2 How to Choose and Compute
Choice of . A naive choice of can be
[TABLE]
for all . However, with the above choice of , it is unclear how to compute efficiently (in polynomial-time). To address the issue, we propose a recursive construction of by relaxing (7): we iteratively define for where so that
[TABLE]
First, we define for following (7):
[TABLE]
for all . The construction of for the rest will be done in a recursive manner. Suppose that is defined for . Then, we define for as follow:
[TABLE]
where we define and such that except for , i.e., . Here, (9) is motivated by the following approximation: and the definition of implies that
[TABLE]
where the equality is due to and .
In essence, we have so far constructed via a dynamic programming to approximate (7), which allows us to compute efficiently. Furthermore, our choice of ensures that is bounded. Before describing how to compute , let us discuss the bound of . For bounding , we discover a bounded set so that instead of characterizing directly. We explicitly describe such as follows.
Claim 1**.**
* where*
[TABLE]
Furthermore, is bounded by
[TABLE]
We present the proof of Claim 1 in the supplementary material. Finally, given and as defined in Claim 1, we approximate the partition function as follows (see (5) and (6)):
[TABLE]
where if .
Computation of . We are now ready to describe how to compute . Since for , it suffices to compute for all . Similar to the construction of , we recursively compute
[TABLE]
i.e., . The recursive computation of is based on the following claim.
Claim 2**.**
.
The proof of Claim 2 is presented in the supplementary material. The above claim implies that once for is obtained, can be efficiently computed using . Here, we consider for . Initially, one can find as follows:
[TABLE]
where \mathbf{f}\big{(}(-1,\dots,-1)\big{)} is defined in (8).
2.3 Provable Guarantee
The succinct description of the proposed approximate inference algorithm described in Section 2.1 and 2.2 is given in Algorithm 1. We further prove the following theoretical guarantee of the algorithm.
Theorem 3**.**
Algorithm 1 outputs such that
[TABLE]
in O\big{(}n2^{r}\prod_{j=1}^{r}(\sqrt{|\lambda_{j}|n}/c+n/2+1)\big{)} time.
The proof of Theorem 3 is presented in the supplementary material. As expected, a smaller quantization interval provides a smaller error bound, but a higher complexity (and vice versa). From Theorem 3, given , one can check that Algorithm 1 guarantees
[TABLE]
if we choose
[TABLE]
Under the choice of , the algorithm complexity becomes O\big{(}(\frac{9}{\varepsilon}r\max(\lambda_{\max},1))^{r}n^{2r+1}\big{)} where . Therefore, if the rank and parameters of GM are bounded, i.e., for all , Algorithm 1 is a fully polynomial-time approximation scheme (FPTAS) for approximating .
Finally, we remark that the following simple trick allows a FPTAS for approximating the partition function of a richer class of GMs: for any diagonal matrix , one can check
[TABLE]
Namely, if there exists a diagonal matrix such that the rank of is (possibly, is not of low-rank though), then one can run Algorithm 1 to approximate and use it to derive from (10).
3 Spectral Inference for High-Rank GMs
In the previous section, we introduced a FPTAS algorithm for approximating the partition function for the special class of low-rank GMs. However, for general (high-rank) GMs, Algorithm 1 is intractable to run as its complexity grows exponentially with respect to the rank. In this section, we address the issue by proposing a new efficient partition function approximation algorithm for general GMs of arbitrary rank. The proposed algorithm utilizes Algorithm 1 as a subroutine. Our main idea is to decompose the partition function of GM into a product of that of rank-1 GMs using the mean-field approximation, and then handle each rank-1 GM via Algorithm 1.
Throughout this section, we assume GMs with . Such a restriction does not harm the generality of our method due to the following:
[TABLE]
where and is the partition function of a GM with . Namely, computing the partition function of any GM is easily reducible to computing that of an alternative GM with .
3.1 Overall Approach: From High-Rank to Low-Rank
To handle high-rank GMs, we first reformulate the partition function by substituting the summation over with the expectation over drawn from the uniform distribution over :
[TABLE]
Then, for approximating the above expectation, we consider the following mean-field approximation via some fully factorized distribution , where , :
[TABLE]
where for and \mathcal{Y}:=\big{\{}\mathbf{y}=[y_{j}=\langle\mathbf{v}_{j},\mathbf{x}\rangle]_{j=1}^{n}:\mathbf{x}\in\Omega\big{\}}. Now, we prove the following claim that the choice of (the marginal probability of the joint distribution ) is optimal for the mean-field approximation in (12), with respect to the Kullback-Leibler (KL) divergence. The proof of Claim 4 is presented in the supplementary material.
Claim 4**.**
\text{KL}\big{(}P_{\mathcal{Y}}(\mathbf{y})||\prod_{j=1}^{n}q_{j}(y_{j})\big{)}* is minimized when for all .*
In summary, under the choice of , we use the following approximation for from (11) and (12):
[TABLE]
where it is easy to check that 2^{n}\mathbb{E}_{\mathbf{x}\sim U_{\Omega}}\big{[}\exp\big{(}\lambda_{j}\langle\mathbf{v}_{j},\mathbf{x}\rangle^{2}\big{)}\big{]} is equivalent to the partition function of a rank-1 GM induced by and can be efficiently approximated using Algorithm 1. We further remark that the mean-field approximation quality in (13) is expected to be better if variables for all are closer to independence. Hence, it is quite a reasonable approximation since for , , are pairwise uncorrelated, i.e., , due to the orthogonality of eigenvectors .
We remark that our mean-field approximation (13) is different from the traditional one (Parisi, 1988). The latter addresses to find a mean-field distribution of ’s minimizing the KL divergence with the original distribution , while our approach minimizes the KL divergence between and , i.e., after spectral processing.
3.2 Improving (13) via Controlling the Diagonal of
It is instructive to remind that varying the diagonal of only changes the partition function by a constant multiplicative factor, as in (10). In order to fully utilize this, we address to optimize the diagonal of to improve our mean-field approximation. To this end, we build the following mean-field approximation by introducing the additional freedom of choosing a diagonal matrix :
[TABLE]
where are those for (analogous to of ). Since it is intractable to find the optimal selection for by directly minimizing the approximation gap of (14) (as computing the true expectations is intractable), we propose to set the free parameter by solving the following semi-definite programming (SDP):
[TABLE]
The intuition behind solving (15) is provided in Section 3.3. We also provide its empirical justification through experimental studies in Section 4.2. We remark that the SDP (15) is equivalent to (the dual of) the popular semi-definite relaxation of the max-cut problem (Goemans & Williamson, 1995) and the maximum eigenvalue minimization problem (Delorme & Poljak, 1993). For the complexity of solving (15), the interior point method (Alizadeh, 1995; Helmberg et al., 1996) has running time and the first order method (Nesterov, 2007) has running time where denotes the target precision to the optimum.222We also refer Section 3 of (Waldspurger et al., 2015) and Section 4 of (Goemans & Williamson, 1995) for more details.
From (11), (14) and (15), our final approximation becomes
[TABLE]
where is a solution of (15) and is an eigenvector of corresponding to . It is trivial that the above approximation with reduces to (13). Finally, we formally state the proposed algorithm in Algorithm 2.
3.3 Intuition for (15)
Now, we describe the intuition why we consider the semi-definite programming (15). To this end, let us re-write the approximation error in (14) as the following alternative view:
[TABLE]
where denotes the approximated partition function. One can easily check that the approximation error is [math] when . Thus, we can expect a very accurate estimation when all eigenvalues of are close to 0. One can also observe that if there exists , then the error might be too huge as and the supports of and are different. Under the above intuitions, we suggest to solve the following problem:
[TABLE]
The optimization (16) is equivalent to (15) since and the condition for all is equivalent to .
4 Experimental Results
In this section, we evaluate our algorithms on diverse environments including both synthetic and UAI datasets to corroborate our theorem and claims.
4.1 Setups
To begin with, we describe our overall experimental settings. We compare our algorithms against the standard inference schemes dominantly used in most applications: belief propagation (BP) (Pearl, 1982), mean-field approximation (MF) (Parisi, 1988), mini-bucket elimination (MBE) (Dechter & Rish, 2003) and weighted mini-bucket elimination (WMBE) (Liu & Ihler, 2011). Since all baselines are iterative methods and have the trade-off between the computation cost and the performance, we choose 200 iterations for BP, 1000 iterations for MF and 10 ibound for MBE and WMBE, for fair comparisons. Below these are referred to as BP-200, MF-1000, MBE-10 and WMBE-10, respectively. In the case of BP and MF, their performances are saturated with the above choice in most cases and there is no gain by running more iterations. On the other hand, one can improve the approximation quality of MBE and WMBE with a larger ibound. However, its complexity grows exponentially with respect to it. We also report the running times of algorithms in our implementation using round brackets following their names, e.g., BP-200 (2s) means that 200 iterations of BP run in 2 seconds (on average) for tested GMs.
Throughout our all experiments, we fix for Algorithm 1 and Algorithm 2 to bound its running time regardless of eigenvalues. For solving the semi-definite programming (SDP) (15), we use CVX (Grant et al., 2008) with SDPT3 solver (Toh et al., 1999) using MATLAB.
For generating synthetic GMs to evaluate on, we first choose the graph structure (it will be specified in each setting) and randomly sample on its vertices and on its edges where Unif denotes the uniform distribution and indicates the strength of pairwise couplings. For measuring the running time for all experiments, we run algorithms using a single thread of CPU. To reduce experimental noise, we average 100 random GMs for each plot unless otherwise stated.
4.2 Investigating the Semi-Definite Programming (15)
In this section, we investigate empirical effects and running time of the proposed SDP (15).
Effect of solving (15). We first investigate how (15) helps the mean-field approximation (14) used in Algorithm 2 compared to other choices of diagonal matrix . In particular, we consider three other choices to compare. The first choice is which does not change the diagonal of . The second choice is which chooses entries of by the maximum eigenvalue of so that . The last choice is which forces to be a diagonal dominant matrix, i.e., . The second and third choices can be thought as feasible, yet non-optimal solutions of (15). Figure 2(a) reports the experimental result for measuring the log partition error for GMs on complete graph having 20 vertices. One can observe that solving (15) is important for the approximation performance of Algorithm 2.
Running time for solving (15). Now, we discuss about the empirical complexity of solving (15). Our solver SDPT3 uses the primal-dual interior point method (Toh et al., 1999) for solving (15). To measure the running time of the solver, we generate random GMs on complete graphs by varying the number of vertices from to . Figure 2(b) illustrates the average running time of our solver where each point is averaged over 10 random GMs. We compare the running time of our solver with quadratic and cubic polynomials with respect to . One can observe that the empirical running time to solve (15) is between and , which is better than the theoretical bound of the interior point method (Helmberg et al., 1996).
4.3 Evaluation of Algorithm 1 under Low-Rank GMs
We evaluate Algorithm 1 under rank-1 GMs, which is used as a subroutine of Algorithm 2. We choose a random eigenvalue and a random eigenvector \mathbf{v}\in\text{Unif}\big{(}\{\mathbf{v}\in\mathbb{R}^{n}:\|\mathbf{v}\|_{2}=1\}\big{)} to generate rank-1 GMs by choosing and . Given , we scale to match the average value of to be equal to some constant (coupling strength in Figure 2(c)), i.e.,
[TABLE]
We remark that rank-1 GMs has the special property that if its eigenvalue is positive (or negative), they are equivalent to ferromagnetic (or antiferromagnetic) models, i.e., (or ) for , respectively. Figure 2(c) reports the algorithm performances under rank-1 GMs. As expected from our theoretical results (Theorem 3), our algorithm is nearly exact, while other algorithms fail. In particular, BP, MBE and WMBE output very poor approximation since they usually fail in antiferromagnetic cases, i.e., negative eigenvalue. The superior performance of Algorithm 1 under rank-1 GMs implies that the approximation error of Algorithm 2 would mainly come from the mean-field approximation (14).
4.4 Evaluation of Algorithm 2 under High-Rank GMs
We now evaluate the empirical performance of Algorithm 2 under synthetic high-rank GMs and UAI datasets (Gogate, 2014). In all cases, we have checked through simulations that BP and MF do not have better accuracy than BP-200 and MF-1000, respectively, even if we run the algorithms with much longer iterations.
To generate synthetic GMs, we consider Erdős-Rényi (ER) random graphs, complete bipartite graphs, complete graphs, and grid graphs. The experimental results are reported in Figure 3(a)-3(e). In all cases, one can observe that our algorithm significantly outperforms others in the high coupling region, i.e., the low-temperature regime. It is known that MF outputs better approximations than others as the underlying graph structure becomes dense, e.g., complete graph (Ellis & Newman, 1978), however, our algorithm remarkably performs better than MF even in such cases. In particular, MF and BP exhibit high variance on their approximation errors in high coupling regions, while ours does not.
We also evaluate our algorithms with GMs on grid graphs in a dataset for UAI 2014 inference competition. It provides 8 GMs on grid graphs, where 4 of them are of 100 vertices () and the other 4 are of 400 vertices (). Figure 3(f) reports the approximation error and the running time of each algorithm. In the experimental results, our algorithm consistently has small errors, while other algorithms often fail badly.
Finally, we compare the running times of algorithms under GMs on complete graphs of 100-500 vertices, which are reported in Figure 4. Here, we do not report WMBE since it is slower than MBE. One can observe that Algorithm 2 scales as well as BP, while MBE does not. MF is the fastest, but it is worst in approximation quality under grid and UAI GMs, as reported in earlier experimental results.
5 Conclusion
In this paper, we provide a completely new angle to design approximate inference algorithms for graphical models. The proposed algorithms scale well for large scale models as like prior iterative message-passing schemes, and outperforms them in approximation quality, in particular, significantly for hard instances. For the future work, we plan to extend our spectral approach to estimating the marginal distributions or/and related inference in higher-order or continuous models.
Acknowledgement
This work was supported by IITP grant funded by the Korea government (MSIT) (No.2017-0-01779, XAI). We would like to acknowledge Sungsoo Ahn for helpful discussions and sharing codes.
Appendix A Proof of Claim 1
We first prove . To this end we introduce the following inequalities for all :
[TABLE]
which directly leads us to , and therefore . Here, the first inequality of (17) is trivial. The second inequality of (17) is from the fact that the error between and arises from a series of quantizations which is presented once in (8) and at most times in (9). Since the quantization error is at most for each quantization, the second inequality of (17) holds.
Now we prove the bound of . From the definition of and , one can easily observe that the following bound on holds:
[TABLE]
where the inequality is from .
Appendix B Proof of Claim 2
Claim 2 holds since
[TABLE]
In the above, is a bijection defined by such that except for . The second equality of (18) is from replacing the summation over by that over \mathbf{g}_{i}\big{(}\mathbf{f}^{-1}(\mathbf{k})\cap(\mathcal{S}_{i}\setminus\mathcal{S}_{i-1})\big{)}. The third equality of (18) is based on (9) which implies that for all , satisfies
[TABLE]
Hence, (19) leads us to
[TABLE]
and the third equality of (18) follows. The fourth equality of (18) directly follows from the definition of that and \big{(}\mathbf{g}_{i}^{-1}(\mathbf{x}^{\prime})\big{)}_{i}=x_{i}=1.
Appendix C Proof of Theorem 3
We first prove the computational complexity of Algorithm 1. Since each possesses a memory of and from Claim 1, the space complexity of Algorithm 1 is O\big{(}2^{r}\prod_{j=1}^{r}(\sqrt{|\lambda_{j}|n}/c+n/2+1)\big{)}. In addition, as the algorithm iterates times while each iteration accesses to and , Algorithm 1 has O\big{(}n2^{r}\prod_{j=1}^{r}(\sqrt{|\lambda_{j}|n}/c+n/2+1)\big{)} computational complexity.
Now we provide the bound on the partition function approximation. First, we refer the following error bound introduced in the proof of Claim 1.
[TABLE]
Using (20), we provide a bound for as follows
[TABLE]
where the first inequality is from (20) and the second inequality is from . From (21), the error bound can be derived as
[TABLE]
where the last inequality follows from (21). One can obtain a same bound for and this completes the proof of Theorem 3.
Appendix D Proof of Claim 4
The result of Claim 4 directly follows from the following inequality:
[TABLE]
where the last inequality follows from the source coding theorem (Shannon, 1948).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ahn et al. (2018 a) Ahn, S., Chertkov, M., Shin, J., and Weller, A. Gauged mini-bucket elimination for approximate inference. In International Conference on Artificial Intelligence and Statistics (AISTATS) , pp. 10–19, 2018 a.
- 2Ahn et al. (2018 b) Ahn, S., Chertkov, M., Weller, A., and Shin, J. Bucket renormalization for approximate inference. In International Conference on Machine Learning (ICML) , pp. 109–118, 2018 b.
- 3Alizadeh (1995) Alizadeh, F. Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM Journal on Optimization , 5(1):13–51, 1995.
- 4Bandyopadhyay & Gamarnik (2008) Bandyopadhyay, A. and Gamarnik, D. Counting without sampling: Asymptotics of the log-partition function for certain statistical physics models. Random Structures & Algorithms , 33(4):452–479, 2008.
- 5Bilmes (2004) Bilmes, J. A. Graphical models and automatic speech recognition. In Mathematical Foundations of Speech and Language Processing , pp. 191–245. Springer, 2004.
- 6Dechter (1999) Dechter, R. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence , 113(1-2):41–85, 1999.
- 7Dechter & Rish (2003) Dechter, R. and Rish, I. Mini-buckets: A general scheme for bounded inference. Journal of the ACM (JACM) , 50(2):107–153, 2003.
- 8Delorme & Poljak (1993) Delorme, C. and Poljak, S. Laplacian eigenvalues and the maximum cut problem. Mathematical Programming , 62(1-3):557–574, 1993.
