Double-Edge Factor Graphs: Definition, Properties, and Examples
Michael X. Cao, Pascal O. Vontobel

TL;DR
This paper introduces double-edge factor graphs (DE-FGs), a new class allowing complex-valued local functions with positive semi-definiteness, and explores their properties and applications, including quantum information processing.
Contribution
It defines DE-FGs, analyzes their sum-product algorithm behavior, and demonstrates promising numerical results with connections to quantum information.
Findings
SPA can be effectively applied to DE-FGs.
DE-FGs accommodate complex-valued functions with positive semi-definiteness.
Numerical experiments show promising results in quantum information contexts.
Abstract
Some of the most interesting quantities associated with a factor graph are its marginals and its partition sum. For factor graphs \emph{without cycles} and moderate message update complexities, the sum-product algorithm (SPA) can be used to efficiently compute these quantities exactly. Moreover, for various classes of factor graphs \emph{with cycles}, the SPA has been successfully applied to efficiently compute good approximations to these quantities. Note that in the case of factor graphs with cycles, the local functions are usually non-negative real-valued functions. In this paper we introduce a class of factor graphs, called double-edge factor graphs (DE-FGs), which allow local functions to be complex-valued and only require them, in some suitable sense, to be positive semi-definite. We discuss various properties of the SPA when running it on DE-FGs and we show promising numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Double-Edge Factor Graphs:
Definition, Properties, and Examples
Michael X. Cao and Pascal O. Vontobel
Department of Information Engineering
The Chinese University of Hong Kong
{m.x.cao, pascal.vontobel}@ieee.org
Abstract
Some of the most interesting quantities associated with a factor graph are its marginals and its partition sum. For factor graphs without cycles and moderate message update complexities, the sum-product algorithm (SPA) can be used to efficiently compute these quantities exactly. Moreover, for various classes of factor graphs with cycles, the SPA has been successfully applied to efficiently compute good approximations to these quantities. Note that in the case of factor graphs with cycles, the local functions are usually non-negative real-valued functions.
In this paper we introduce a class of factor graphs, called double-edge factor graphs (DE-FGs), which allow local functions to be complex-valued and only require them, in some suitable sense, to be positive semi-definite. We discuss various properties of the SPA when running it on DE-FGs and we show promising numerical results for various example DE-FGs, some of which have connections to quantum information processing.
I Introduction
On the one hand, many classical algorithms like Kalman filtering, the BCJR algorithm, the forward-backward algorithm, etc., can be seen as special cases of the sum-product algorithm (SPA) applied to suitable cycle-free factor graphs [1, 2]. On the other hand, the SPA has also been successfully applied to various classes of factor graphs with cycles, as is for example witnessed by the SPA-based decoding techniques of low-density parity-check (LDPC) codes, which appear nowadays in various telecommunication standards [1, 2].
For the case of SPA on factor graphs with cycles, there are a few results that hold for large classes of factor graphs (like the result by Yedidia et al. [3], which states that fixed points of the SPA correspond to stationary points of the Bethe free energy function) or the graph-cover-based interpretation of the Bethe approximation of the partition sum [4], but in general the results are for special classes of factor graphs like Gaussian graphical models (see, e.g., [5]) or log-supermodular (“attractive”) graphical models (see, e.g., [6]). In most of these cases, the focus has been on factor graphs with non-negative real-valued local functions. However, there are applications, in particular in the area of quantum information processing, where one would like to have more general factor graphs. Let us mention some of the approaches that have been pursued:
- •
One approach replaces scalar-valued local functions by matrix-valued local functions (see, e.g., [7, 8]).
- •
Another approach keeps scalar-valued local functions, but imposes certain symmetry conditions on the factor graph [9, 10]. (See the discussion and the references in [10] on how the factor graphs therein are related to tensor networks, etc.) The framework in [9, 10] can, for example, be conveniently used for estimating information rates of channels with a classical input and output and a quantum memory [11].
Note that all marginal calculations were done exactly in [9, 10, 11]. This can be achieved, for example, by first merging suitable variables so that the resulting factor graph is cycle free and then to apply the SPA. (Of course, this only gives practical algorithms as long as the alphabet sizes of the merged variables are not too large.)
However, similar to the above-mentioned classes of factor graphs with cycles, it is tempting to also apply the SPA to factor graphs as in [9, 10] with cycles. There are different approaches to accomplish this by suitably reformulating the factor graphs in [9, 10], some reformulations having better complexity properties, some reformulations having better analytical properties. An interesting option in this design space are the double-edge factor graphs (DE-FGs) that we introduce in this paper.111As we will see, the name “double-edge” comes from the fact that pairs of edges (and with that the associated variables) are merged. For example, referring to in Fig. 2 (top), the edge associated with variable and the edge associated with variable are merged to a double-edge in Fig. 2 (bottom).
This paper is structured as follows. We define DE-FGs in Section II and then formulate the SPA, along with some of its properties, in Section III. We discuss a variety of examples in Section IV, we point out connections to a recent paper by Mori in Section V, and we conclude the paper in Section VI. Note that throughout this paper, all alphabets are assumed to be finite.
II Double-edge Factor Graphs
In this section we define double-edge factor graphs (DE-FGs), more precisely, double-edge normal factor graphs (DE-NFGs). The word “normal” refers to the fact that variables appear as arguments of only one or two local functions.222In the same way that any factor graph can be suitably reformulated as a normal factor graph [12], any DE-FG can be suitably reformulated as a DE-NFG. With this, there is no loss in generality in considering only DE-NFGs.
Example 1**.**
Consider the DE-NFG in Fig. 1, which is a pictorial representation of the factorization
[TABLE]
It is called a DE-NFG because some of the edges are double lines that correspond to variables that are paired. (For example, and are paired in Fig. 1.) Such paired variables are assumed to have both the same alphabet. Moreover, as detailed below, the local functions have to satisfy some constraints.
Definition 2**.**
Consider the factorization
[TABLE]
represented by some DE-NFG. We will use the following conventions:
- •
We call the global function.
- •
We call the local functions. With some abuse of notation, we will also use to refer to the corresponding function node in the DE-NFG.
- •
For every function node , the variables associated with the incident double-edges are collected in .
- •
For every function node , the variables associated with the incident single-edges are collected in .
Most importantly, we require every local function to have the following property:
the local function is complex-valued
and is positive semi-definite (PSD).
The latter property is to be understood as follows: for every and every complex-valued function over the alphabet of (and with that also over the alphabet of ), it holds that
[TABLE]
(Here and in the following, over-bar denotes complex conjugation.) Clearly, if a function node has no incident double edges, then the condition in (1) reduces to the condition that the local function takes on only non-negative real values.
For proving various properties of DE-NFG, the following observation is very beneficial.
Remark 3**.**
For every local function and every , there are a finite set and some complex-valued functions , , over the alphabet of such that
[TABLE]
This follows easily from the eigenvalue decomposition of PSD matrices.
Proposition 4**.**
The partition sum of a DE-NFG, i.e.,
[TABLE]
is a non-negative real number.
Proof.
This can be proven with the help of Remark 3. We omit the details because of space limitations. ∎
As already mentioned, one of the main motivations of the present paper are the NFGs in [9, 10]. So let us show how a “typical” NFG in [9, 10] can be formulated as a DE-NFG.
Example 5**.**
Consider the NFG in Fig. 2 (top), which can be used to do probability computations for the following quantum mechanical setup:
- •
At the beginning, some quantum mechanical system is in some mixed state (represented by the density matrix , which is a PSD matrix).
- •
The system then evolves unitarily (represented by ).
- •
Afterwards, a sub-system is measured (represented by measurement operators ).
- •
Finally the system evolves unitarily (represented by ).
(For further details, we refer to [9, 10].) This NFG can be turned into the DE-NFG shown in Fig. 2 (bottom) by suitably merging edges (and with that the associated variables) and by suitably defining the DE-NFG’s function nodes. For example, the function node is defined to be
[TABLE]
Clearly, the function satisfies the required PSD constraint. In fact, the expression in (2) is in the form of the decomposition in Remark 3.
One can check that the redrawing procedure in Example 5 can be applied to all relevant NFGs in [9, 10].
III Sum-Product Algorithm on DE-NFGs
and the Bethe Approximation
In this section we define the SPA for DE-NFGs and discuss some of its properties. In particular, we connect it to generalized versions of the Bethe free energy function.
Once a DE-NFG as in Fig. 1 or in Fig. 2 (bottom) has been defined, we simply consider it as a particular type of NFG and apply the SPA in the standard way [1, 2]. Some comments:
- •
In this paper we only discuss the flooding schedule [1], where all messages are updated at every iteration. Clearly, other update schedules are possible and might be preferable in some cases.
- •
If desired, message can be rescaled by a positive scalar at every iteration.
- •
For reasons of simplicity, we discuss only the case where all edges are full edges, i.e., connect two function nodes. (Note that any DE-NFG can be turned into such a DE-NFG by attaching suitable dummy function nodes to half-edges, thereby turning half-edges into full-edges without changing marginals or the partition sum.)
Recall that in the case of NFGs, messages are functions over the alphabet of the variable associated with an edge. Therefore, along a single-edge between some function nodes and , we will have messages and at iteration . Similarly, along a double-edge between some function nodes and , we will have messages and at iteration .
Assumption 6**.**
We make the following assumptions about the initial messages, i.e., about the messages at time :
- •
Messages along single-edges are positive real-valued functions.
- •
Messages along double-edges are complex-valued positive definite (PD) functions.
Proposition 7**.**
Let the messages be initialized as in Assumption 6. Then for every iteration it holds that:
- •
Messages along single-edges are non-negative real-valued functions.
- •
Messages along double-edges are complex-valued PSD functions.
Proof.
One approach to prove these statements is based on Remark 3. Another approach is based on Schur’s product theorem, which states that the component-wise product of two PSD matrices is a PSD matrix.333Actually, Schur’s product theorem makes the stronger statement that the component-wise product of two PD matrices is a PD matrix. ∎
Definition 8**.**
Consider a collection of SPA messages, one for every edge in both directions. Let
[TABLE]
where is the set of all edges, where for every we define Z_{f}\triangleq\sum_{\mathbf{x}_{\partial{f}},\mathbf{x}^{\prime}_{\partial{f}},\mathbf{y}_{\delta{f}}}f(\mathbf{x}_{\partial{f}},\mathbf{x}^{\prime}_{\partial{f}};\mathbf{y}_{\delta{f}})\cdot\bigl{(}\prod_{e\in\partial{f}}\mu_{e\to f}(x_{e},x^{\prime}_{e})\bigr{)}\cdot\bigl{(}\prod_{e\in\delta{f}}\mu_{e\to f}(y_{e})\bigr{)}, where for every single-edge between function nodes and we define , and where for every double-edge between function nodes and we define .
Proposition 9**.**
The function in Definition 8 has the following properties:
- •
Assume that the messages have the properties in Proposition 7 and assume that is well-defined, i.e., for all . Then is a non-negative real number.
- •
Fixed points of the SPA correspond to stationary points of the function . (This generalizes a theorem by Yedidia et al. **[3]**.)
Proof.
Omitted due to space limitations. ∎
Evaluating in Definition 8 at a fixed point of the SPA results in the Bethe approximation of the partition sum of the DE-NFG.
One can also generalize the Bethe free energy function from [3], where is a function over a suitable generalization of the local marginal polytope. While a statement (analogous to a statement in [3]) that fixed points of the SPA correspond to stationary points of can be made, evaluating based on is trickier because of the multi-valuedness of the complex logarithm.
IV Examples
In this section we discuss various examples of DE-NFGs. In particular, we compare the obtained Bethe approximation of the partition sum with the true partition sum. (The NFGs in this section have modest sizes so that the true partition function can be computed efficiently.) Moreover, for the first example, we can also make some analytical statements.
Example 10**.**
Let be some integer larger than one. Consider a DE-NFG whose topology is an -cycle and where all variables take on values in the same finite alphabet . (Fig. 3 shows such a DE-NFG for .) Let be a complex-valued PD matrix of size with entries . For , we define the local function to be . (All indices are modulo .)
In order to proceed, it is convenient to define the complex-value matrix of size with entries and to define , .
Let be the SPA message along the double edge from to at time index . Similarly, let be the SPA message along the double edge from to at time index . Clearly,
[TABLE]
For , we assume the following initializations and , where is the Kronecker-delta function.
Because of the properties of the matrix that are induced by the properties of the matrix , the SPA message update rules in (3)–(4) represent so-called completely positive maps (see, e.g., [13]). (For this statement we ignore the rescaling factors.) Using generalizations of Perron–Frobenius theory (see [14, 15]), one can make the following statements:
- •
For every , the message converges to a PD matrix as .
- •
For every , the message converges to a PD matrix as .
- •
The eigenvalue of the matrix with maximum absolute value is a real number and is unique. Let us call it .
- •
The Bethe approximation of the partition sum is
[TABLE]
Compare this result with the partition sum, which is
[TABLE]
where are the eigenvalues of . We see that the smaller the ratios \bigl{(}\frac{\lambda_{j}}{\lambda_{0}}\bigr{)}^{n}, , are, the better the Bethe approximation is.
For and , Fig. 3(c) shows the obtained and values for experiments based on randomly generating matrices , which are based on randomly generating unitary matrices and diagonal matrices , where the diagonal entries of are sampled i.i.d. from a standard distribution with one degree of freedom. We see that very often the ratio is rather close to .
Example 11**.**
Consider now the DE-NFG in Fig. 3(b). For , Fig. 3(d) shows the obtained and values for experiments based on randomly generating local functions. In contrast to Example 10, where for every instantiation all local function were the same, here for every instantiation all local function are generated independently. We observe the ratio is reasonably close to , but typically larger than .
Example 12**.**
Let be a complex-valued matrix of size with entries . The permanent [16] of is defined to be , where the summation is over all permutations of the set . Ryser’s algorithm, one of the most efficient algorithms for exactly computing for general matrices , requires arithmetic operations [17], and so the exact computation of permanent is intractable, even for moderate values of . Note that even the computation of the permanent of matrices that contain only zeros and ones is #P-complete [18].
One can formulate an NFG whose partition sum equals , see, e.g., Fig. 1 in [19]. That NFG is a complete bipartite graph with function nodes on the left and function nodes on the right. Here, Fig. 4(a), shows a slightly modified version of that NFG. All variables take values in the set . Moreover, for all , the function is defined to be
[TABLE]
for all , the function is defined analogously; and for all , the function is defined to be
[TABLE]
In this example, we consider the following, rather natural generalization to the DE-NFG in Fig. 4(b), where we will use the short-hand for \bigl{(}x^{\mathrm{L}}_{i,j},\tilde{x}^{\mathrm{L^{\prime}}}_{i,j}\bigr{)}, etc. Assume that for , is a complex-valued PSD matrix of size . With this, for , the function is defined to be
[TABLE]
for all , the function is defined analogously; and for all , the function is defined to be
[TABLE]
(One can easily verify that these local function define indeed a DE-NFG.) Finally, let be the partition sum of this DE-NFG.
This DE-NFG definition has the following two important special cases:
- •
If \tilde{\theta}_{i,j}=\bigl{(}\begin{smallmatrix}1&0\\ 0&\theta_{i,j}\end{smallmatrix}\bigr{)} for all , then .
- •
If \tilde{\theta}_{i,j}=\bigl{(}\begin{smallmatrix}1\\ \theta_{i,j}\end{smallmatrix}\bigr{)}\cdot\bigl{(}\begin{smallmatrix}1&\overline{\theta_{i,j}}\end{smallmatrix}\bigr{)}=\Bigl{(}\begin{smallmatrix}1&\overline{\theta_{i,j}}\\ \theta_{i,j}&|\theta_{i,j}|^{2}\end{smallmatrix}\Bigr{)} for all , then \tilde{Z}=\operatorname{perm}(\mathbf{\theta})\cdot\operatorname{perm}(\overline{\mathbf{\theta}})=\bigl{|}\operatorname{perm}(\mathbf{\theta})\bigr{|}^{2}, where denotes the matrix whose entries are the complex-conjugate values of the entries of . (Note that such partition sums are of interest in quantum information processing [20], where are certain types of square matrices over the complex numbers. We refer to [20] for details.)
In our experiments, we considered the following setup. Namely, for every , we independently generate as follows: ; is picked uniformly from the unit circle in the complex plane; ; is picked uniformly (and independently of the other entries) from the real line interval . Fig. 4(c) shows the obtained and values for experiments for the case . We observe that the ratio is concentrated around a value smaller than .
V Connections to a Paper by Mori
Finally, let us point out that there are strong connections of DE-NFGs to the setup in Section V of a recent paper by Mori [21]. Assume to have a bipartite DE-NFG. (Such a DE-NFG can always be obtained by suitably inserting dummy function nodes.) Then the partition sum can be written as some inner product between, on the one hand, the tensor product of the local functions corresponding to first class of function nodes of the bipartite DE-NFG, and, on the other hand, the tensor product of the local functions corresponding to the second class of function nodes of the bipartite DE-NFG. Once this connection is observed, one can translate Mori’s results (like loop calculus expansions) to DE-NFGs.
VI Conclusion
In this paper we have defined DE-NFGs and studied some of their properties. In particular, we have shown some promising numerical studies of the Bethe approximation to the partition sum. Many open questions remain. For example, can some of the results in [19] be generalized to the setup in Example 12? Or, as in the context of computing the pattern maximum likelihood estimate, which can be formulated as optimizing the parameters of some graphical model toward maximizing the partition function, and where the Bethe partition sum was beneficially used as a surrogate function [22], can the Bethe partition sum of a DE-NFG serve as a suitable surrogate function in some partition function optimization problem?
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory , vol. 47, no. 2, pp. 498–519, Feb. 2001.
- 2[2] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Sig. Proc. Mag. , vol. 21, no. 1, pp. 28–41, Jan. 2004.
- 3[3] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. Inf. Theory , vol. 51, no. 7, pp. 2282–2312, Jul. 2005.
- 4[4] P. O. Vontobel, “Counting in graph covers: a combinatorial characterization of the Bethe entropy function,” IEEE Trans. Inf. Theory , vol. 59, no. 9, pp. 6018–6048, Sep. 2013.
- 5[5] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J. Mach. Learn. Res. , vol. 7, pp. 2031–2064, Dec. 2006.
- 6[6] N. Ruozzi, “The Bethe partition function of log-supermodular graphical models,” in Proc. Neural Inf. Proc. Sys. Conf. , Lake Tahoe, NV, USA, Dec. 3–6 2012.
- 7[7] M. S. Leifer and D. Poulin, “Quantum graphical models and belief propagation,” Ann. Phys. , vol. 323, pp. 1899–1946, 2008.
- 8[8] M. X. Cao and P. O. Vontobel, “Quantum factor graphs: closing-the-box operation and variational approaches,” in Proc. Int. Symp. Inf. Theory and its Appl. , Monterey, CA, USA, Oct. 30–Nov. 2 2016, pp. 651–655.
