Simply Exponential Approximation of the Permanent of Positive Semidefinite Matrices
Nima Anari, Leonid Gurvits, Shayan Oveis Gharan, Amin Saberi

TL;DR
This paper introduces a deterministic polynomial-time algorithm that approximates the permanent of positive semidefinite matrices within a factor of approximately 4.84, using a convex relaxation approach.
Contribution
It presents the first polynomial-time approximation algorithm with a provable approximation factor for the permanent of positive semidefinite matrices.
Findings
The algorithm achieves a $c^n$ approximation with $c \\approx 4.84$.
The convex relaxation used is natural and effective.
The approximation factor is shown to be asymptotically tight.
Abstract
We design a deterministic polynomial time approximation algorithm for the permanent of positive semidefinite matrices where . We write a natural convex relaxation and show that its optimum solution gives a approximation of the permanent. We further show that this factor is asymptotically tight by constructing a family of positive semidefinite matrices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Simply Exponential Approximation of the Permanent of Positive Semidefinite Matrices
Nima Anari
Stanford University, {anari, saberi}@stanford.edu
Leonid Gurvits
The City College of New York, [email protected]
Shayan Oveis Gharan
University of Washington, [email protected]
Amin Saberi
Stanford University, {anari, saberi}@stanford.edu
Abstract
We design a deterministic polynomial time approximation algorithm for the permanent of positive semidefinite matrices where . We write a natural convex relaxation and show that its optimum solution gives a approximation of the permanent. We further show that this factor is asymptotically tight by constructing a family of positive semidefinite matrices.
1 Introduction
Given a matrix , its permanent is defined as
[TABLE]
where is the set of permutations on . There is a very rich body of work on permanent of matrices and its algebraic properties, see [Bap07] for a recent survey on several theorems and open problems in this area.
The problem has been also studied from the point of view of computational complexity. Valiant [Val79] showed that it is #P complete to compute the permanent of -matrices. Aaronson [Aar11] gave a new proof of the #P hardness, using the model of linear optical quantum computing. In addition, he showed that it is #P hard to compute the sign of , essentially ruling out a multiplicative approximation. Grier and Schaeffer [GS16] extended Aaronson’s proof and proved #P hardness of computing the permanent of real orthogonal matrices. They also showed by a simple polynomial interpolation argument that it is #P hard to compute the permanent of PSD matrices.
Given a general matrix , Gurvits [Gur05] designed a randomized algorithm that in time approximates within additive error, where is the largest singular value of . Chakhmakhchyan, Cerf, and Garcia-Patron [CCG16] improve on Gurvits’s algorithm if the matrix is PSD and its eigenvalues satisfy a certain smoothness property.
If all entries of are nonnegative then by definition. In particular, if , then is equal to the number of perfect matchings of the bipartite graph associated with . Jerrum, Sinclair, and Vigoda [JSV04] in a breakthrough obtained a fully polynomial time randomized approximation scheme (FPRAS) for the permanent of matrices with nonnegative entries. In other words, they designed a randomized algorithm that for any given , outputs a multiplicative approximation of the permanent, in time polynomial in and .
The focus of this paper is on the permanent of PSD matrices, which has received significant attention in the last decade because of its close connection to quantum optics. In particular, permanent of PSD matrices describe output probabilities of a boson sampling experiment in which the input is a tensor product of thermal states. They form the generating function of the quantum linear optical distribution [GS16].
It turns out that the permanent is a monotone function with respect to the Loewner order on the cone of PSD matrices and therefore the permanent of every PSD matrix is nonnegative (see corollaries 1 and 2). This fact is a priori not obvious considering that a PSD matrix can have negative entries. Since the permanent is nonnegative, unlike general matrices, there is no difficulty in computing the sign. So, it may be possible to design a polynomial time approximation scheme for the permanent of PSD matrices. This question has been posted as an open problem in several sources [Aar, GS16]. Our main result can be seen as a first step along this line.
To this date, not much is known about multiplicative approximation of the permanent of PSD matrices. To the best of our knowledge, the only previous result is the work of Marcus [Mar63] which shows that the product of the diagonal entries of a PSD matrix gives an approximation of the permanent. For any PSD matrix ,
[TABLE]
This approximation can be slightly improved using Lieb’s permanent inequality [Lie02]. Using this inequality one can show that can be approximated to within a factor of in time .
In this paper we design a multiplicative approximation algorithm for computing the permanent of PSD matrices, where is a universal constant. Prior to our paper, no efficient algorithm (deterministic, randomized, or quantum) was known for simply exponential approximation of the permanent of general positive semidefinite matrices.
Theorem 1**.**
There is a deterministic polynomial time algorithm that for any given PSD matrix returns a number such that
[TABLE]
where and is Euler’s constant.
Our result uses a semidefinite relaxation. Because of the aformenetioned monotonicity of the permanent with respect to the positive semidefinite order, a natural way to upper bound the permanent of a hermitian PSD matrix is to find another matrix whose permanent is easy to compute, and to use as the upper bound. For example if is a diagonal matrix, then
[TABLE]
gives an easy-to-compute upper bound on . This motivates the following natural relaxation for the permanent of PSD matrices.
Definition 1**.**
For an hermitian PSD matrix define
[TABLE]
Our main result is to prove that also lower bounds up to a multiplicative factor. Additionally, we show that can be efficiently computed using convex programming, thus giving a polynomial-time approximation algorithm for .
2 Preliminaries
We denote the set by . We use to denote the set of permutations on .
2.1 Linear Algebra
We identify vectors with matrices. For a matrix we let denote its conjugate transpose; in other words . A matrix is called hermitian iff . A hermitian matrix is called positive semidefinite (PSD) iff for all . We let denote the usual Loewner order on hermitian matrices, i.e., iff is PSD. For a vector , we let denote the diagonal matrix with coordinates of as its main diagonal, i.e.,
[TABLE]
For matrices and we let denote the Kronecker product, i.e., the following block matrix:
[TABLE]
For a matrix and , we define as . The Kronecker product respects the Loewner order on hermitian PSD matrices:
Fact 1**.**
If and , then .
2.2 Standard Complex Normal Distribution
We say that a complex-valued random variable is distributed according to a standard complex normal, which we denote by , iff . The probability density function (over ) for this distribution is given by
[TABLE]
Fact 2**.**
If , then for integers we have
[TABLE]
Proof.
The distribution of is circularly symmetric, i.e. for with , we have . This means that
[TABLE]
Therefore, unless , we have . When , we have . If we let , then the probability density function of is given by . Therefore we have
[TABLE]
where we used integration by parts. We can finally derive
[TABLE]
∎
Fact 3**.**
If , then
[TABLE]
where is Euler’s constant.
Proof.
Note that . Since , the random variable is distributed according to a -distribution with degrees of freedom, which is identical to a distribution [Cha93]. Therefore we have
[TABLE]
where is the digamma function [Cha93]. This implies that , and the latter is equal to [AS64]. ∎
We say that a random vector is distributed according to a standard complex normal, which we denote by , iff are independent standard complex normals.
Fact 4**.**
If , and is a unit vector, i.e., , then .
Proof.
Note that are linear combinations of the real and imaginary parts of ; as such, this -dimensional vector is distributed according to for some and .
The distribution of is circularly symmetric; i.e., if is such that , then is distributed the same way as . This is true because , and has the same distribution as . Being circularly symmetric implies that and for some constant . On the other hand, we have
[TABLE]
Therefore or in other words, . ∎
2.3 Permanent and Loewner Order
For a matrix , its permanent is defined as
[TABLE]
Permanent is a monotone function on the space of PSD matrices w.r.t. the Loewner order. For completeness we sketch the proof given in [Bap07] here.
Lemma 1**.**
For any matrix , there is a vector such that
[TABLE]
Proof.
The vector is constructed in the following way: Index each of the coordinates by in the usual way (so that the indices respect the Kronecker product); we can think of as a function from to . Then let the -th coordinate of be iff is a permutation on , and let it be [math] otherwise. Then, for a matrix we have
[TABLE]
∎
Corollary 1**.**
If are hermitian and , then
[TABLE]
Proof.
The statement of the lemma follows, because implies that by 1. So, by lemma 1,
[TABLE]
as desired. ∎
Corollary 2**.**
For any hermitian PSD matrix , .
Proof.
This follows from corollary 1 by setting . ∎
There is another way to show nonnegativity of the permanent over the PSD cone with the help of the complex normal distribution. For a vector define
[TABLE]
Then with the help of we can express the permanent of a PSD matrix as an expectation of a nonnegative value.
Lemma 2**.**
Let be arbitrary and let be a random vector distributed according to the standard complex normal . Then
[TABLE]
Lemma 2 is a sepcial case of the relationship between the so-called -norm and the quantum permanent shown in [Gur03]. In particular if the rows of are , then
[TABLE]
and therefore is the same as the -norm of the polynomial . In [Gur03] this is shown to be equal to the quantum permanent of the linear operator with Choi form given by the matrices . It can be further shown that in this special case, the quantum permanent reduces to . For exact definitions and further details see [Gur03].
For the sake of completeness, we give a self-contained proof of lemma 2 below.
Proof of lemma 2.
We will use the fact that the expression is a polynomial in and ; therefore we can evaluate its expectation with the help of 2. We have
[TABLE]
If we define
[TABLE]
then . Note that is a polynomial in terms of . We can expand as follows:
[TABLE]
where the sum is taken over all functions . For a function , let be where is the number of such that . Then we can alternatively write
[TABLE]
For , by 2 we have . Therefore we can write
[TABLE]
where we used that by 2. Note that when , there is a permutation such that . In fact if , then the number of for which is exactly equal to . Therefore we can rewrite the above sum as
[TABLE]
∎
3 Approximation of Permanent on the PSD Cone
In this section we prove theorem 1. Recall the definition of from definition 1. Our first step is to prove that for every hermitian PSD matrix :
[TABLE]
where .
In order to prove eq. 2, we also introduce a lower bound on . We find a vector such that . By corollary 1, . So in order to prove eq. 2 it suffices to prove:
Theorem 2**.**
For a hermitian PSD matrix , there exists such that and
[TABLE]
where .
Note that the above shows that for every hermitian PSD matrix , there exists a diagonal matrix and a rank 1 matrix such that
[TABLE]
and for . Thus is sandwiched between and , two quantities that differ by at most a simply exponential factor.
It is also worth noting that there is no additional loss in approximating by the permanent of a rank one matrix. In section 4, we will show that the constant is not only asymptotically tight in theorem 2, but also in eq. 2.
Another interesting corollary of theorem 2 is that that instead of we can use as an approximation of , with the same approximation factor:
[TABLE]
Moreover, is easily computable.
Fact 5**.**
For a vector , we have .
Proof.
For any permutation we have
[TABLE]
Since is the sum of the above quantity for all , we get that . ∎
Even though has a closed form, we do not have an efficient way of computing the in eq. 3, whereas, as we show in section 3.2, can be computed efficiently.
The next section is dedicated to proving theorem 2. To finish up the proof of theorem 1 we need to design an algorithm to compute for a given PSD matrix .
Theorem 3**.**
There is an algorithm that outputs an -approximation of for any hermitian PSD in time , where represents the bit complexity of .
We will prove the above theorem in section 3.2. Theorems 2 and 3 together complete the proof of theorem 1. In section 4 we show that the constant in eq. 2 is asymptotically tight.
3.1 Proof of the Main Result
In order to prove theorem 2, we use a seemingly unrelated quantity about distributions on unit vectors . Let us define this quantity below.
Definition 2**.**
For a discrete distribution supported on the sphere , define
[TABLE]
where is the span of the support of , i.e., the set of vectors for which the denominator is nonzero.
We will prove theorem 2 by showing that there exists such that and
[TABLE]
where is an appropriately constructed distribution on unit vectors. The expression is lower bounded by . Thus if we show that , the above inequality would imply the multiplicative factor of desired in theorem 2.
To gain some intuition about , note that by Jensen’s inequality, applied to the concave function , it is easy to see that :
[TABLE]
On the other hand, we will show that for all , .
Proposition 1**.**
For all discrete distributions supported on the sphere ,
[TABLE]
This universal lower bound is independent of the dimension or the size of the support of . We defer the proof of proposition 1 to the end of this section.
Let us now prove theorem 2, assuming correctness of proposition 1.
Proof of theorem 2.
Let us break down the proof into a series of claims, and then prove them one by one.
Claim 1**.**
The infimum in eq. 1 is achieved by some diagonal matrix . In other words there exists a diagonal matrix such that .
Claim 2**.**
We may assume without loss of generality that .
Claim 3**.**
The first-order optimality condition of implies that there exists a correlation matrix , i.e., a hermitian PSD matrix with s on its main diagonal, such that .
We may use the Cholesky decomposition to write where for .
Claim 4**.**
For any the vector satisfies
[TABLE]
Naturally we may want to choose so as to maximize .
Claim 5**.**
We have
[TABLE]
where is the uniform distribution on the columns of .
And now the statement of theorem 2 follows, because when ; we have found such that and
[TABLE]
Let us now prove the claims one by one.
Proof of 1.
We divide the proof into two cases. First assume that for all . Let be larger than the maximum eigenvalue of . Then . This proves that . Note that implies for all . If any entry of satisfies
[TABLE]
then
[TABLE]
This effectively eliminates such a as a candidate for the in eq. 1. Therefore we may take of over the set of all diagonal matrices which in addition to satisfy
[TABLE]
for all . This is a compact set, and is a continuous function. Therefore the is achieved by some matrix .
For the second case, assume that for some . Then since is PSD, the -th row and the -th column of are both zero. Let be larger than the largest eigenvalue of . Define by and for . It is easy to see that and . Therefore and it is achieved at . ∎
Proof of 2.
First note that without loss of generality we may assume , since otherwise and the conclusion of theorem 2 is trivial.
Now let be an arbitrary positive vector and define by
[TABLE]
Note that respects the Loewner order and maps diagonal matrices to diagonal matrices. It is one-to-one and surjective on the space of diagonal matrices. The matrix is obtained from by multiplying column by for and then row by for . Therefore
[TABLE]
This implies that
[TABLE]
It is also easy to see that the above also implies . In particular if is set so that , then . So we can replace by and continue the proof of theorem 2 to find satisfying
[TABLE]
and with . Let . Then . This implies that
[TABLE]
and
[TABLE]
∎
Proof of 3.
We use the first-order optimality condition of at . Let us change to where is a diagonal matrix. Then if is small enough . More precisely, we have
[TABLE]
If then for all . If , then for small enough , which contradicts the fact that . This implies that the optimal solution of the following SDP is [math]:
[TABLE]
The dual of this SDP has variables , corresponding to the constraint , and for , corresponding to the constraint :
[TABLE]
Because of strong duality, the optimum of this SDP is [math]. The optimal satisfies and for , i.e., is a correlation matrix. We also have . But since and , this implies that or in other words . ∎
Proof of 4.
We have with and . This implies that is invertible. Now we have
[TABLE]
This together with implies that
[TABLE]
In other words, is an eigenvector of with eigenvalue . This means that is also such an eigenvector. So and . We conclude that . ∎
Proof of 5.
Let us compute . By 5 we have
[TABLE]
Let the columns of be . Then , and note that . We can rewrite as
[TABLE]
Now if we let be the uniform distribution on , we can rewrite the above as
[TABLE]
Therefore
[TABLE]
∎
This concludes the proof of theorem 2. ∎
It only remains to prove proposition 1.
Proof of proposition 1.
Without loss of generality we may assume that ; if that is not the case, we can identify with for some using a unitary transformation and nothing changes.
Let be a -dimensional standard complex normal. Let
[TABLE]
Then our goal is to prove that or equivalently . To this end, we will prove that , and the conclusion follows.
By 4, for each fixed in the support of , . Therefore we have
[TABLE]
On the other hand by 3 we have
[TABLE]
where the inequality is an application of Jensen’s to the convex function . Putting these together we get that as desired. ∎
3.2 Computing the Approximation
In this section we show how to approximately compute . The main result of this section will be theorem 3.
The main ingredient of the proof is transforming to the objective of a convex program. The original optimization problem that computes is the following:
[TABLE]
The objective is not concave, even if we apply to it. The trick is to change from the variables to . If we have the Cholesky decomposition for some , then if and only if
[TABLE]
So we can turn the optimization problem into the following by identifying with .
[TABLE]
If the objective of the above program is , then . Note that is convex over , so the above is a valid convex program.
Proof of theorem 3.
We can detect whether by checking whether any of ’s main diagonal entries are [math]. See the proof of 1.
When all of the main diagonal entries of are strictly positive, similar to the proof of 1, we can determine upper and lower bounds on the optimum . In particular if is a number larger than the largest eigenvalue of , for the optimum we have
[TABLE]
Thus, we can restrict the domain of the convex program in eq. 4 to a compact bounded domain. We can compute the Cholesky decomposition of and then use our favorite convex programming technique, such as the ellipsoid method, to find the optimum value of eq. 4 to within accuracy in time . This gives us a approximation of which by eq. 2 is a approximation of for .
As a final remark, we note that the approximation factor in eq. 2 can in fact be slightly strengthened to
[TABLE]
if one carefully reviews the proof. The term is at most , but the difference allows us to absorb into the approximation factor for an appropriately chosen . This allows us to state an -free result: We can find an approximation to in time . ∎
4 Asymptotically Tight Examples
In this section we show that the constant cannot be replaced by anything smaller in eq. 2. In other words we will construct hermitian PSD matrices such that
[TABLE]
The construction will begin with a distribution that is uniform over unit vectors . We will later show how we can construct so that is arbitrarily close to .
Lemma 3**.**
For any there exists a distribution that is uniform over unit vectors for some and that satisfies
[TABLE]
We postpone the proof of lemma 3 to the end of this section. For now we use it to show the following. The following proposition together with lemma 3 show that cannot be improved in eq. 2.
Proposition 2**.**
Given a distribution that is uniform over a finite number of unit vectors , we can construct a sequence of matrices of sizes such that
[TABLE]
Proof.
Our goal is to construct a PSD matrix and relate to . We will assume without loss of generality that ; otherwise, we use a unitary transformation to map onto a lower dimensional space and would not change.
Consider the matrix whose columns are . Note that and has s on the main diaognal. In other words is a correlation matrix of rank . Since , the matrix is invertible and we can define
[TABLE]
and
[TABLE]
We will study and and relate them to .
As observed in the proof of 3, correlation matrices can be used as optimality certificates for , albeit in that context first order optimality was just a necessary condition. We now make a formal claim by certifying that using as the certificate.
Claim 6**.**
If is constructed as above, then
[TABLE]
Proof.
We clearly have . This implies that . Now consider a diagonal matrix . We need to show that . Without loss of generality, by adding a small multiple of if necessary, we may assume that . Now implies that
[TABLE]
which in turn implies
[TABLE]
By taking the trace we get
[TABLE]
Since has s on the diagonal and is diagonal the above becomes
[TABLE]
By using the AM-GM inequality we get
[TABLE]
This means that . ∎
Next we study . This is where the term appears.
Claim 7**.**
If is constructed as above, then
[TABLE]
Before proving 7, let us show why it suffices to finish the proof of proposition 2. By 6 and 7 we have
[TABLE]
This is not quite the same as yet. However we have one degree of freedom we have not used. Initially we assumed was a uniform distribution over unit vectors. But we might have as well assumed that it is a uniform distribution over unit vectors for any integer , by simply repeating the vectors in the support of . Therefore we may make as large as we would like without changing or . As , by Stirling’s formula we have
[TABLE]
and by a simple bound for large enough
[TABLE]
Therefore as we have
[TABLE]
It only remains to prove 7.
Proof of 7.
We will use lemma 2 to write down . Let be distributed according to a -dimensional standard complex normal . Then according to lemma 2 we have
[TABLE]
Our goal is to use to bound . According to the definition of , for the vector we have
[TABLE]
Note that
[TABLE]
This means that . We also have
[TABLE]
Putting these together we get
[TABLE]
Let us now compute . We have
[TABLE]
According to 2, we have . Therefore
[TABLE]
where in the last equality we used the fact the number of ways to write as a sum of nonnegative integers is . We conclude by getting
[TABLE]
∎
This finishes the proof of proposition 2. ∎
Now we switch gears and construct the distribution promised by lemma 3.
Proof of lemma 3.
The idea is to make be close to the uniform distribution on the sphere for some large . If we were allowed to pick to be uniform over the sphere, then intuitively all choices of in the definition of would yield the same value and we would be able to argue about this common value using the same tricks as in the proof of proposition 1. Instead we use the uniform distribution on a large number of samples from the sphere to serve as the proxy for the uniform distribution on the sphere itself. We further need the dimension to grow, to make the uniform distribution on the sphere similar to a (scaled) normal distribution. We now make these formal.
Let us fix some and let denote the uniform distribution on the sphere . For any fixed distance we can cover the sphere by a finite number of balls where are unit vectors and
[TABLE]
Let be a large number and draw random points from . We will let be the uniform distribution over . We would like to argue that is with high probability close to . Because the sphere was covered by the balls around ’s, for each unit vector we have for some . This implies that
[TABLE]
On the other hand by the law of large numbers for each we have with high probability as
[TABLE]
Let us condition on the event that the LHS of the above are sufficiently close to the RHS for all . This event happens with high probability as . Note that because of symmetry, the RHS of the above are independent of the choice of . Under this condition we have for all unit vectors
[TABLE]
where is any arbitrary vector and as . The above bounds the LHS for unit vectors . However note that the LHS does not change if we scale by any constant. Therefore is bounded by the RHS. As we take the limit with and we get with asymptotically bounded by .
Now it only remains to show that as the dimension grows . Let be an arbitrary point with such as where is the first element of the standard basis. When is a random point on the sphere, we would like to argue that is almost distributed like . If this were the case we would have
[TABLE]
where in the last equality we used 3.
To make this approximation rigorous, let us generate the random point on the sphere by the following process: We sample a standard -dimensional complex normal and then we let . We have . Therefore
[TABLE]
The random variable is distributed according to a -scaled -distribution with degrees of freedom which is the same as . We can therefore write
[TABLE]
where is the digamma function [Cha93, AS64]. We therefore have .
For we observe that
[TABLE]
The random variables are identically distributed for different . As such we have
[TABLE]
Therefore
[TABLE]
This shows that as and concludes the proof. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Aar] S. Aaronson See point (4) of Comment #84 at http://www.scottaaronson.com/blog/?p=2408
- 2[Aar 11] Scott Aaronson “A linear-optical proof that the permanent is# P-hard” In Proc. R. Soc. A 467.2136 , 2011, pp. 3393–3405 The Royal Society
- 3[AS 64] Milton Abramowitz and Irene A Stegun “Handbook of mathematical functions: with formulas, graphs, and mathematical tables” Courier Corporation, 1964
- 4[Bap 07] RB Bapat “Recent Developments and Open Problems in the Theory of Permanents” In The Mathematics student 76.1 , 2007, pp. 55
- 5[CCG 16] L Chakhmakhchyan, NJ Cerf and R Garcia-Patron “A quantum-inspired algorithm for estimating the permanent of positive semidefinite matrices” In ar Xiv preprint ar Xiv:1609.02416 , 2016
- 6[Cha 93] Shing Ping Chan “A statistical study of log-gamma distribution”, 1993
- 7[GS 16] Daniel Grier and Luke Schaeffer “New Hardness Results for the Permanent Using Linear Optics” Electronic Colloquium on Computational Complexity (ECCC), 2016 URL: http://eccc.hpi-web.de/report/2016/159
- 8[Gur 03] Leonid Gurvits “Classical deterministic complexity of Edmonds’ problem and quantum entanglement” In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing , 2003, pp. 10–19 ACM
