Simply Exponential Approximation of the Permanent of Positive   Semidefinite Matrices

Nima Anari; Leonid Gurvits; Shayan Oveis Gharan; Amin Saberi

arXiv:1704.03486·math.CO·April 13, 2017

Simply Exponential Approximation of the Permanent of Positive Semidefinite Matrices

Nima Anari, Leonid Gurvits, Shayan Oveis Gharan, Amin Saberi

PDF

TL;DR

This paper introduces a deterministic polynomial-time algorithm that approximates the permanent of positive semidefinite matrices within a factor of approximately 4.84, using a convex relaxation approach.

Contribution

It presents the first polynomial-time approximation algorithm with a provable approximation factor for the permanent of positive semidefinite matrices.

Findings

01

The algorithm achieves a $c^n$ approximation with $c \\approx 4.84$.

02

The convex relaxation used is natural and effective.

03

The approximation factor is shown to be asymptotically tight.

Abstract

We design a deterministic polynomial time $c^{n}$ approximation algorithm for the permanent of positive semidefinite matrices where $c = e^{γ + 1} ≃ 4.84$ . We write a natural convex relaxation and show that its optimum solution gives a $c^{n}$ approximation of the permanent. We further show that this factor is asymptotically tight by constructing a family of positive semidefinite matrices.

Equations209

per (A) = σ \in S_{n} \sum i = 1 \prod n A_{i, σ (i)},

per (A) = σ \in S_{n} \sum i = 1 \prod n A_{i, σ (i)},

i = 1 \prod n A_{i, i} \leq per (A) \leq n! i = 1 \prod n A_{i, i} .

i = 1 \prod n A_{i, i} \leq per (A) \leq n! i = 1 \prod n A_{i, i} .

rel (A) \geq per (A) \geq c^{- n} rel (A)

rel (A) \geq per (A) \geq c^{- n} rel (A)

per (D) = D_{11} D_{22} \dots D_{nn}

per (D) = D_{11} D_{22} \dots D_{nn}

rel (A) := in f {per (D) : D is diagonal and D ⪰ A} .

rel (A) := in f {per (D) : D is diagonal and D ⪰ A} .

diag (v) := v_{1} 0 ⋮ 0 0 v_{2} ⋮ 0 \dots \dots ⋱ \dots 00 ⋮ v_{n} .

diag (v) := v_{1} 0 ⋮ 0 0 v_{2} ⋮ 0 \dots \dots ⋱ \dots 00 ⋮ v_{n} .

A \otimes B := A_{11} B ⋮ A_{n 1} B \dots ⋱ \dots A_{1 m} B ⋮ A_{nm} B .

A \otimes B := A_{11} B ⋮ A_{n 1} B \dots ⋱ \dots A_{1 m} B ⋮ A_{nm} B .

\frac{1}{π} e^{- (Re (g)^{2} + Im (g)^{2})} = \frac{1}{π} e^{- ∣ g ∣^{2}} .

\frac{1}{π} e^{- (Re (g)^{2} + Im (g)^{2})} = \frac{1}{π} e^{- ∣ g ∣^{2}} .

E [g^{n} \overline{g}^{m}] = {0 n! if n \neq = m, if n = m .

E [g^{n} \overline{g}^{m}] = {0 n! if n \neq = m, if n = m .

E [g^{n} \overline{g}^{m}] = E [(ug)^{n} (\overline{ug})^{m}] = u^{n - m} E [g^{n} \overline{g}^{m}] .

E [g^{n} \overline{g}^{m}] = E [(ug)^{n} (\overline{ug})^{m}] = u^{n - m} E [g^{n} \overline{g}^{m}] .

E [∣ g ∣^{2 n}]

E [∣ g ∣^{2 n}]

= n \int_{0}^{\infty} r^{2 n - 2} \cdot 2 r e^{- r^{2}} d r = n \cdot E [∣ g ∣^{2 n - 2}],

E [∣ g ∣^{2 n}] = n \cdot E [∣ g ∣^{2 n - 2}] = n (n - 1) \cdot E [∣ g ∣^{2 n - 4}] = \dots = n! \cdot E [∣ g ∣^{0}] = n! .

E [∣ g ∣^{2 n}] = n \cdot E [∣ g ∣^{2 n - 2}] = n (n - 1) \cdot E [∣ g ∣^{2 n - 4}] = \dots = n! \cdot E [∣ g ∣^{0}] = n! .

E [ln (∣ g ∣^{2})] = - γ,

E [ln (∣ g ∣^{2})] = - γ,

E [ln (2 ∣ g ∣^{2})] = ψ (1) + ln (2),

E [ln (2 ∣ g ∣^{2})] = ψ (1) + ln (2),

2 c = E [∣ u^{†} v ∣^{2}] = E [u^{†} v v^{†} u] = u^{†} E [v v^{†}] u = u^{†} I u = ∣ u ∣^{2} = 1.

2 c = E [∣ u^{†} v ∣^{2}] = E [u^{†} v v^{†} u] = u^{†} E [v v^{†}] u = u^{†} I u = ∣ u ∣^{2} = 1.

per (A) := σ \in S_{n} \sum i = 1 \prod n A_{i, σ (i)} .

per (A) := σ \in S_{n} \sum i = 1 \prod n A_{i, σ (i)} .

per (M) := \frac{1}{n !} 1_{S_{n}}^{†} M^{\otimes n} 1_{S_{n}} .

per (M) := \frac{1}{n !} 1_{S_{n}}^{†} M^{\otimes n} 1_{S_{n}} .

1_{S_{n}}^{†} M^{\otimes n} 1_{S_{n}} = σ \in S_{n} \sum σ^{'} \in S_{n} \sum i = 1 \prod n M_{σ (i), σ^{'} (i)} = σ \in S_{n} \sum per (M) = n! \cdot per (M) .

1_{S_{n}}^{†} M^{\otimes n} 1_{S_{n}} = σ \in S_{n} \sum σ^{'} \in S_{n} \sum i = 1 \prod n M_{σ (i), σ^{'} (i)} = σ \in S_{n} \sum per (M) = n! \cdot per (M) .

per (A) \geq per (B) .

per (A) \geq per (B) .

per (A) = \frac{1}{n !} 1_{S_{n}}^{†} A^{\otimes n} 1_{S_{n}} \geq \frac{1}{n !} 1_{S_{n}}^{†} B^{\otimes n} 1_{S_{n}} = per (B)

per (A) = \frac{1}{n !} 1_{S_{n}}^{†} A^{\otimes n} 1_{S_{n}} \geq \frac{1}{n !} 1_{S_{n}}^{†} B^{\otimes n} 1_{S_{n}} = per (B)

∣ v ∣_{Π} := i = 1 \prod n ∣ v_{i} ∣^{2} \geq 0.

∣ v ∣_{Π} := i = 1 \prod n ∣ v_{i} ∣^{2} \geq 0.

per (U^{†} U) = E_{x \sim C N (0, I)} [∣ U^{†} x ∣_{Π}^{2}] .

per (U^{†} U) = E_{x \sim C N (0, I)} [∣ U^{†} x ∣_{Π}^{2}] .

∣ U^{†} x ∣_{Π}^{2} = ∣ det (i = 1 \sum d x_{i} diag (v_{i})) ∣^{2},

∣ U^{†} x ∣_{Π}^{2} = ∣ det (i = 1 \sum d x_{i} diag (v_{i})) ∣^{2},

∣ U^{†} x ∣_{Π}^{2} = ∣ i = 1 \prod n j = 1 \sum d \overline{U_{j i}} x_{j} ∣^{2}

∣ U^{†} x ∣_{Π}^{2} = ∣ i = 1 \prod n j = 1 \sum d \overline{U_{j i}} x_{j} ∣^{2}

p (x) := i = 1 \prod n j = 1 \sum d \overline{U_{j i}} x_{j},

p (x) := i = 1 \prod n j = 1 \sum d \overline{U_{j i}} x_{j},

p (x) = σ : [n] \to [d] \sum i = 1 \prod n \overline{U_{σ (i), i}} x_{σ (i)},

p (x) = σ : [n] \to [d] \sum i = 1 \prod n \overline{U_{σ (i), i}} x_{σ (i)},

p (x) = k_{1} + \dots + k_{d} = n k_{1}, \dots, k_{d} \geq 0 \sum x_{1}^{k_{1}} \dots x_{d}^{k_{d}} σ : [n] \to [d] sig (σ) = (k_{1}, \dots, k_{d}) \sum i = 1 \prod n \overline{U_{σ (i), i}} .

p (x) = k_{1} + \dots + k_{d} = n k_{1}, \dots, k_{d} \geq 0 \sum x_{1}^{k_{1}} \dots x_{d}^{k_{d}} σ : [n] \to [d] sig (σ) = (k_{1}, \dots, k_{d}) \sum i = 1 \prod n \overline{U_{σ (i), i}} .

E_{x} [p (x) \overline{p (x)}] = k_{1} + \dots + k_{d} = n k_{1}, \dots, k_{d} \geq 0 \sum (k_{1}! \dots k_{d}! \adjustlimits \sum_{σ : [n] \to [d] sig (σ) = (k_{1}, \dots, k_{d})} \sum_{σ^{'} : [n] \to [d] sig (σ^{'}) = (k_{1}, \dots, k_{d})} i = 1 \prod n \overline{U_{σ (i), i}} U_{σ^{'} (i), i}),

E_{x} [p (x) \overline{p (x)}] = k_{1} + \dots + k_{d} = n k_{1}, \dots, k_{d} \geq 0 \sum (k_{1}! \dots k_{d}! \adjustlimits \sum_{σ : [n] \to [d] sig (σ) = (k_{1}, \dots, k_{d})} \sum_{σ^{'} : [n] \to [d] sig (σ^{'}) = (k_{1}, \dots, k_{d})} i = 1 \prod n \overline{U_{σ (i), i}} U_{σ^{'} (i), i}),

E_{x} [p (x) \overline{p (x)}]

E_{x} [p (x) \overline{p (x)}]

= π \in S_{n} \sum i = 1 \prod n j = 1 \sum d (U^{†})_{i, j} U_{j, π^{- 1} (i)} = π \in S_{n} \sum i = 1 \prod n (U^{†} U)_{i, π^{- 1} (i)} = per (U^{†} U) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Simply Exponential Approximation of the Permanent of Positive Semidefinite Matrices

Nima Anari

Stanford University, {anari, saberi}@stanford.edu

Leonid Gurvits

The City College of New York, [email protected]

Shayan Oveis Gharan

University of Washington, [email protected]

Amin Saberi

Stanford University, {anari, saberi}@stanford.edu

Abstract

We design a deterministic polynomial time $c^{n}$ approximation algorithm for the permanent of positive semidefinite matrices where $c=e^{\gamma+1}\simeq 4.84$ . We write a natural convex relaxation and show that its optimum solution gives a $c^{n}$ approximation of the permanent. We further show that this factor is asymptotically tight by constructing a family of positive semidefinite matrices.

1 Introduction

Given a matrix $A\in\mathbb{C}^{n\times n}$ , its permanent is defined as

[TABLE]

where $S_{n}$ is the set of permutations on $\{1,\dots,n\}$ . There is a very rich body of work on permanent of matrices and its algebraic properties, see [Bap07] for a recent survey on several theorems and open problems in this area.

The problem has been also studied from the point of view of computational complexity. Valiant [Val79] showed that it is #P complete to compute the permanent of $\{0,1\}$ -matrices. Aaronson [Aar11] gave a new proof of the #P hardness, using the model of linear optical quantum computing. In addition, he showed that it is #P hard to compute the sign of $\operatorname{per}(A)$ , essentially ruling out a multiplicative approximation. Grier and Schaeffer [GS16] extended Aaronson’s proof and proved #P hardness of computing the permanent of real orthogonal matrices. They also showed by a simple polynomial interpolation argument that it is #P hard to compute the permanent of PSD matrices.

Given a general matrix $A\in\mathbb{R}^{n\times n}$ , Gurvits [Gur05] designed a randomized algorithm that in time $O(n^{2}/\epsilon^{2})$ approximates $\operatorname{per}(A)$ within $\pm\lvert A\rvert^{n}$ additive error, where $\lvert A\rvert$ is the largest singular value of $A$ . Chakhmakhchyan, Cerf, and Garcia-Patron [CCG16] improve on Gurvits’s algorithm if the matrix $A$ is PSD and its eigenvalues satisfy a certain smoothness property.

If all entries of $A$ are nonnegative then $\operatorname{per}(A)\geq 0$ by definition. In particular, if $A\in\{0,1\}^{n\times n}$ , then $\operatorname{per}(A)$ is equal to the number of perfect matchings of the bipartite graph associated with $A$ . Jerrum, Sinclair, and Vigoda [JSV04] in a breakthrough obtained a fully polynomial time randomized approximation scheme (FPRAS) for the permanent of matrices with nonnegative entries. In other words, they designed a randomized algorithm that for any given $\epsilon>0$ , outputs a $1+\epsilon$ multiplicative approximation of the permanent, in time polynomial in $n$ and $1/\epsilon$ .

The focus of this paper is on the permanent of PSD matrices, which has received significant attention in the last decade because of its close connection to quantum optics. In particular, permanent of PSD matrices describe output probabilities of a boson sampling experiment in which the input is a tensor product of thermal states. They form the generating function of the quantum linear optical distribution [GS16].

It turns out that the permanent is a monotone function with respect to the Loewner order on the cone of PSD matrices and therefore the permanent of every PSD matrix is nonnegative (see corollaries 1 and 2). This fact is a priori not obvious considering that a PSD matrix can have negative entries. Since the permanent is nonnegative, unlike general matrices, there is no difficulty in computing the sign. So, it may be possible to design a polynomial time approximation scheme for the permanent of PSD matrices. This question has been posted as an open problem in several sources [Aar, GS16]. Our main result can be seen as a first step along this line.

To this date, not much is known about multiplicative approximation of the permanent of PSD matrices. To the best of our knowledge, the only previous result is the work of Marcus [Mar63] which shows that the product of the diagonal entries of a PSD matrix gives an $n!$ approximation of the permanent. For any PSD matrix $A\in\mathbb{R}^{n\times n}$ ,

[TABLE]

This approximation can be slightly improved using Lieb’s permanent inequality [Lie02]. Using this inequality one can show that $\operatorname{per}(A)$ can be approximated to within a factor of $n!/m!^{n/m}$ in time $2^{O(m+\log n)}$ .

In this paper we design a $c^{n}$ multiplicative approximation algorithm for computing the permanent of PSD matrices, where $c>0$ is a universal constant. Prior to our paper, no efficient algorithm (deterministic, randomized, or quantum) was known for simply exponential approximation of the permanent of general positive semidefinite matrices.

Theorem 1.

There is a deterministic polynomial time algorithm that for any given PSD matrix $A$ returns a number $\operatorname{rel}(A)$ such that

[TABLE]

where $c=e^{\gamma+1}$ and $\gamma$ is Euler’s constant.

Our result uses a semidefinite relaxation. Because of the aformenetioned monotonicity of the permanent with respect to the positive semidefinite order, a natural way to upper bound the permanent of a hermitian PSD matrix $A\in\mathbb{C}^{n\times n}$ is to find another matrix $D\succeq A$ whose permanent is easy to compute, and to use $\operatorname{per}(D)$ as the upper bound. For example if $D\succeq A$ is a diagonal matrix, then

[TABLE]

gives an easy-to-compute upper bound on $\operatorname{per}(A)$ . This motivates the following natural relaxation for the permanent of PSD matrices.

Definition 1.

For an $n\times n$ hermitian PSD matrix $A$ define

[TABLE]

Our main result is to prove that $\operatorname{rel}(A)$ also lower bounds $\operatorname{per}(A)$ up to a multiplicative factor. Additionally, we show that $\operatorname{rel}(A)$ can be efficiently computed using convex programming, thus giving a polynomial-time approximation algorithm for $\operatorname{per}(A)$ .

2 Preliminaries

We denote the set $\{1,\dots,n\}$ by $[n]$ . We use $S_{n}$ to denote the set of permutations on $[n]$ .

2.1 Linear Algebra

We identify vectors $v\in\mathbb{C}^{n}$ with $n\times 1$ matrices. For a matrix $A\in\mathbb{C}^{n\times m}$ we let $A^{\dagger}\in\mathbb{C}^{m\times n}$ denote its conjugate transpose; in other words $(A^{\dagger})_{ij}=\overline{A_{ji}}$ . A matrix $A\in\mathbb{C}^{n\times n}$ is called hermitian iff $A=A^{\dagger}$ . A hermitian matrix $A$ is called positive semidefinite (PSD) iff $v^{\dagger}Av\geq 0$ for all $v\in\mathbb{C}^{n}$ . We let $\succeq$ denote the usual Loewner order on hermitian matrices, i.e., $A\succeq B$ iff $A-B$ is PSD. For a vector $v\in\mathbb{C}^{n}$ , we let $\operatorname{diag}(v)\in\mathbb{C}^{n\times n}$ denote the diagonal matrix with coordinates of $v$ as its main diagonal, i.e.,

[TABLE]

For matrices $A\in\mathbb{C}^{n\times m}$ and $B\in\mathbb{C}^{p\times q}$ we let $A\otimes B$ denote the Kronecker product, i.e., the following block matrix:

[TABLE]

For a matrix $A$ and $n\geq 0$ , we define $A^{\otimes n}$ as $\overbrace{A\otimes A\otimes\dots\otimes A}^{n}$ . The Kronecker product respects the Loewner order on hermitian PSD matrices:

Fact 1.

If $A\succeq B\succeq 0$ and $C\succeq D\succeq 0$ , then $A\otimes C\succeq B\otimes D\succeq 0$ .

2.2 Standard Complex Normal Distribution

We say that a complex-valued random variable $g=\operatorname{Re}(g)+i\operatorname{Im}(g)$ is distributed according to a standard complex normal, which we denote by $g\sim\mathbb{C}\mathcal{N}(0,1)$ , iff $(\operatorname{Re}(g),\operatorname{Im}(g))\sim\mathcal{N}(0,\frac{1}{2}I)$ . The probability density function (over $\mathbb{C}\simeq\mathbb{R}^{2}$ ) for this distribution is given by

[TABLE]

Fact 2.

If $g\sim\mathbb{C}\mathcal{N}(0,1)$ , then for integers $n,m\geq 0$ we have

[TABLE]

Proof.

The distribution of $g$ is circularly symmetric, i.e. for $u\in\mathbb{C}$ with $\lvert u\rvert=1$ , we have $ug\sim\mathbb{C}\mathcal{N}(0,1)$ . This means that

[TABLE]

Therefore, unless $n-m=0$ , we have $\mathbb{E}[g^{n}\overline{g}^{m}]=0$ . When $m=n$ , we have $g^{n}\overline{g}^{m}=\lvert g\rvert^{2n}$ . If we let $r=\lvert g\rvert\in\mathbb{R}_{\geq 0}$ , then the probability density function of $r$ is given by $2\pi r\frac{1}{\pi}e^{-r^{2}}=2re^{-r^{2}}$ . Therefore we have

[TABLE]

where we used integration by parts. We can finally derive

[TABLE]

∎

Fact 3.

If $g\sim\mathbb{C}\mathcal{N}(0,1)$ , then

[TABLE]

where $\gamma$ is Euler’s constant.

Proof.

Note that $\lvert g\rvert^{2}=\operatorname{Re}(g)^{2}+\operatorname{Im}(g)^{2}=\frac{1}{2}(2\operatorname{Re}(g)^{2}+2\operatorname{Im}(g)^{2})$ . Since $(\operatorname{Re}(g),\operatorname{Im}(g))\sim\mathcal{N}(0,\frac{1}{2}I)$ , the random variable $2\operatorname{Re}(g)^{2}+2\operatorname{Im}(g)^{2}$ is distributed according to a $\chi^{2}$ -distribution with $2$ degrees of freedom, which is identical to a $\Gamma(1,2)$ distribution [Cha93]. Therefore we have

[TABLE]

where $\psi$ is the digamma function [Cha93]. This implies that $\mathbb{E}[\ln(\lvert g\rvert^{2})]=\psi(1)$ , and the latter is equal to $-\gamma$ [AS64]. ∎

We say that a random vector $v\in\mathbb{C}^{n}$ is distributed according to a standard complex normal, which we denote by $v\sim\mathbb{C}\mathcal{N}(0,I)$ , iff $v_{1},\dots,v_{n}$ are independent standard complex normals.

Fact 4.

If $v\sim\mathbb{C}\mathcal{N}(0,I)$ , and $u\in\mathbb{C}^{n}$ is a unit vector, i.e., $\lvert u\rvert^{2}=u^{\dagger}u=1$ , then $u^{\dagger}v\sim\mathbb{C}\mathcal{N}(0,1)$ .

Proof.

Note that $(\operatorname{Re}(u^{\dagger}v),\operatorname{Im}(u^{\dagger}v))$ are linear combinations of the real and imaginary parts of $v$ ; as such, this $2$ -dimensional vector is distributed according to $\mathcal{N}(\mu,\Sigma)$ for some $\mu\in\mathbb{R}^{2}$ and $\Sigma\in\mathbb{R}^{2\times 2}$ .

The distribution of $u^{\dagger}v$ is circularly symmetric; i.e., if $\phi\in\mathbb{C}$ is such that $\lvert\phi\rvert=1$ , then $\phi u^{\dagger}v$ is distributed the same way as $u^{\dagger}v$ . This is true because $\phi u^{\dagger}v=u^{\dagger}(\phi v)$ , and $\phi v$ has the same distribution as $v$ . Being circularly symmetric implies that $\mu=0$ and $\Sigma=cI$ for some constant $c$ . On the other hand, we have

[TABLE]

Therefore $(\operatorname{Re}(u^{\dagger}v),\operatorname{Im}(u^{\dagger}v))\sim\mathcal{N}(0,\frac{1}{2}I)$ or in other words, $u^{\dagger}v\sim\mathbb{C}\mathcal{N}(0,1)$ . ∎

2.3 Permanent and Loewner Order

For a matrix $A\in\mathbb{C}^{n\times n}$ , its permanent is defined as

[TABLE]

Permanent is a monotone function on the space of PSD matrices w.r.t. the Loewner order. For completeness we sketch the proof given in [Bap07] here.

Lemma 1.

For any matrix $M\in\mathbb{C}^{n\times n}$ , there is a vector $1_{S_{n}}\in\mathbb{C}^{n^{n}}$ such that

[TABLE]

Proof.

The vector $1_{S_{n}}\in\mathbb{C}^{n^{n}}$ is constructed in the following way: Index each of the $n^{n}$ coordinates by $\sigma\in[n]^{n}$ in the usual way (so that the indices respect the Kronecker product); we can think of $\sigma$ as a function from $[n]$ to $[n]$ . Then let the $\sigma$ -th coordinate of $1_{S_{n}}$ be $1$ iff $\sigma$ is a permutation on $[n]$ , and let it be [math] otherwise. Then, for a matrix $M$ we have

[TABLE]

∎

Corollary 1.

If $A,B\in\mathbb{C}^{n\times n}$ are hermitian and $A\succeq B\succeq 0$ , then

[TABLE]

Proof.

The statement of the lemma follows, because $A\succeq B\succeq 0$ implies that $A^{\otimes n}\succeq B^{\otimes n}\succeq 0$ by 1. So, by lemma 1,

[TABLE]

as desired. ∎

Corollary 2.

For any hermitian PSD matrix $A\in\mathbb{C}^{n\times n}$ , $\operatorname{per}(A)\geq 0$ .

Proof.

This follows from corollary 1 by setting $B=0$ . ∎

There is another way to show nonnegativity of the permanent over the PSD cone with the help of the complex normal distribution. For a vector $v\in\mathbb{C}^{n}$ define

[TABLE]

Then with the help of $\lvert\cdot\rvert_{\Pi}$ we can express the permanent of a PSD matrix as an expectation of a nonnegative value.

Lemma 2.

Let $U\in\mathbb{C}^{d\times n}$ be arbitrary and let $x\in\mathbb{C}^{d}$ be a random vector distributed according to the standard complex normal $\mathbb{C}\mathcal{N}(0,I)$ . Then

[TABLE]

Lemma 2 is a sepcial case of the relationship between the so-called $G$ -norm and the quantum permanent shown in [Gur03]. In particular if the rows of $U$ are $u_{1}^{\dagger},\dots,u_{d}^{\dagger}$ , then

[TABLE]

and therefore $\mathbb{E}_{x}[\lvert U^{\dagger}x\rvert_{\Pi}^{2}]$ is the same as the $G$ -norm of the polynomial $\det(\sum_{i=1}^{d}x_{i}\operatorname{diag}(u_{i}))$ . In [Gur03] this is shown to be equal to the quantum permanent of the linear operator with Choi form given by the matrices $\operatorname{diag}(u_{1}),\dots,\operatorname{diag}(u_{d})$ . It can be further shown that in this special case, the quantum permanent reduces to $\operatorname{per}(U^{\dagger}U)$ . For exact definitions and further details see [Gur03].

For the sake of completeness, we give a self-contained proof of lemma 2 below.

Proof of lemma 2.

We will use the fact that the expression $\lvert U^{\dagger}x\rvert_{\Pi}^{2}$ is a polynomial in $x_{1},\dots,x_{d}$ and $\overline{x_{1}},\dots,\overline{x_{d}}$ ; therefore we can evaluate its expectation with the help of 2. We have

[TABLE]

If we define

[TABLE]

then $\lvert U^{\dagger}x\rvert_{\Pi}^{2}=p(x)\overline{p(x)}$ . Note that $p(x)$ is a polynomial in terms of $x_{1},\dots,x_{d}$ . We can expand $p(x)$ as follows:

[TABLE]

where the sum is taken over all $n^{d}$ functions $\sigma:[n]\to[d]$ . For a function $\sigma:[n]\to[d]$ , let $\operatorname{sig}(\sigma)$ be $(k_{1},\dots,k_{d})\in\mathbb{Z}^{d}$ where $k_{j}$ is the number of $i\in[n]$ such that $\sigma(i)=j$ . Then we can alternatively write

[TABLE]

For $(k_{1},\dots,k_{d})\neq(k_{1}^{\prime},\dots,k_{d}^{\prime})$ , by 2 we have $\mathbb{E}_{x}[x_{1}^{k_{1}}\dots x_{d}^{k_{d}}\overline{x_{1}^{k_{1}^{\prime}}\dots x_{d}^{k_{d}^{\prime}}}]=0$ . Therefore we can write

[TABLE]

where we used that $\mathbb{E}[x_{1}^{k_{1}}\dots x_{d}^{k_{d}}\overline{x_{1}^{k_{1}}\dots x_{d}^{k_{d}}}]=k_{1}!\dots k_{d}!$ by 2. Note that when $\operatorname{sig}(\sigma)=\operatorname{sig}(\sigma^{\prime})$ , there is a permutation $\pi\in S_{n}$ such that $\sigma^{\prime}=\sigma\circ\pi$ . In fact if $\operatorname{sig}(\sigma)=\operatorname{sig}(\sigma^{\prime})=(k_{1},\dots,k_{d})$ , then the number of $\pi\in S_{n}$ for which $\sigma^{\prime}=\sigma\circ\pi$ is exactly equal to $k_{1}!\dots k_{d}!$ . Therefore we can rewrite the above sum as

[TABLE]

∎

3 Approximation of Permanent on the PSD Cone

In this section we prove theorem 1. Recall the definition of $\operatorname{rel}(A)$ from definition 1. Our first step is to prove that for every $n\times n$ hermitian PSD matrix $A\succeq 0$ :

[TABLE]

where $c=e^{\gamma+1}$ .

In order to prove eq. 2, we also introduce a lower bound on $\operatorname{per}(A)$ . We find a vector $v\in\mathbb{C}^{n}$ such that $A\succeq vv^{\dagger}$ . By corollary 1, $\operatorname{per}(A)\geq\operatorname{per}(vv^{\dagger})$ . So in order to prove eq. 2 it suffices to prove:

Theorem 2.

For a hermitian PSD matrix $A\in\mathbb{C}^{n\times n}$ , there exists $v\in\mathbb{C}^{n}$ such that $A\succeq vv^{\dagger}$ and

[TABLE]

where $c=e^{\gamma+1}$ .

Note that the above shows that for every hermitian PSD matrix $A\in\mathbb{C}^{n\times n}$ , there exists a diagonal matrix $D$ and a rank 1 matrix $vv^{\dagger}$ such that

[TABLE]

and $\operatorname{per}(D)\leq c^{n}\operatorname{per}(vv^{\dagger})$ for $c=e^{\gamma+1}$ . Thus $\operatorname{per}(A)$ is sandwiched between $\operatorname{per}(D)$ and $\operatorname{per}(vv^{\dagger})$ , two quantities that differ by at most a simply exponential factor.

It is also worth noting that there is no additional loss in approximating $\operatorname{per}(A)$ by the permanent of a rank one matrix. In section 4, we will show that the constant $e^{\gamma+1}$ is not only asymptotically tight in theorem 2, but also in eq. 2.

Another interesting corollary of theorem 2 is that that instead of $\operatorname{rel}(A)$ we can use $\operatorname{per}(vv^{\dagger})$ as an approximation of $\operatorname{per}(A)$ , with the same $e^{n(\gamma+1)}$ approximation factor:

[TABLE]

Moreover, $\operatorname{per}(vv^{\dagger})$ is easily computable.

Fact 5.

For a vector $v\in\mathbb{C}^{n}$ , we have $\operatorname{per}(vv^{\dagger})=n!\cdot\prod_{i=1}^{n}\lvert v_{i}\rvert^{2}$ .

Proof.

For any permutation $\sigma\in S_{n}$ we have

[TABLE]

Since $\operatorname{per}(vv^{\dagger})$ is the sum of the above quantity for all $\sigma\in S_{n}$ , we get that $\operatorname{per}(vv^{\dagger})=n!\cdot\prod_{i=1}^{n}\lvert v_{i}\rvert^{2}$ . ∎

Even though $\operatorname{per}(vv^{\dagger})$ has a closed form, we do not have an efficient way of computing the $\sup$ in eq. 3, whereas, as we show in section 3.2, $\operatorname{rel}(A)$ can be computed efficiently.

The next section is dedicated to proving theorem 2. To finish up the proof of theorem 1 we need to design an algorithm to compute $\operatorname{rel}(A)$ for a given PSD matrix $A$ .

Theorem 3.

There is an algorithm that outputs an $e^{n(\gamma+1)}$ -approximation of $\operatorname{per}(A)$ for any hermitian PSD $A\in\mathbb{C}^{n\times n}$ in time $\operatorname{poly}(n+\langle A\rangle)$ , where $\langle A\rangle$ represents the bit complexity of $A$ .

We will prove the above theorem in section 3.2. Theorems 2 and 3 together complete the proof of theorem 1. In section 4 we show that the constant $c=e^{\gamma+1}$ in eq. 2 is asymptotically tight.

3.1 Proof of the Main Result

In order to prove theorem 2, we use a seemingly unrelated quantity about distributions on unit vectors $\{u\in\mathbb{C}^{d}:\lvert u\rvert^{2}=u^{\dagger}u=1\}$ . Let us define this quantity below.

Definition 2.

For a discrete distribution $\mathcal{U}$ supported on the sphere $\{u\in\mathbb{C}^{d}:\lvert u\rvert^{2}=u^{\dagger}u=1\}$ , define

[TABLE]

where $\operatorname{span}(\mathcal{U})$ is the span of the support of $\mathcal{U}$ , i.e., the set of vectors for which the denominator is nonzero.

We will prove theorem 2 by showing that there exists $v\in\mathbb{C}^{n}$ such that $A\succeq vv^{\dagger}$ and

[TABLE]

where $\mathcal{U}$ is an appropriately constructed distribution on unit vectors. The expression $n!/n^{n}$ is lower bounded by $e^{-n}$ . Thus if we show that $f(\mathcal{U})\geq e^{-\gamma}$ , the above inequality would imply the multiplicative factor of $e^{n(\gamma+1)}$ desired in theorem 2.

To gain some intuition about $f(\mathcal{U})$ , note that by Jensen’s inequality, applied to the concave function $\ln$ , it is easy to see that $f(\mathcal{U})\leq 1$ :

[TABLE]

On the other hand, we will show that for all $\mathcal{U}$ , $f(\mathcal{U})\geq e^{-\gamma}$ .

Proposition 1.

For all discrete distributions $\mathcal{U}$ supported on the sphere $\{u\in\mathbb{C}^{d}:\lvert u\rvert^{2}=u^{\dagger}u=1\}$ ,

[TABLE]

This universal lower bound is independent of the dimension $d$ or the size of the support of $\mathcal{U}$ . We defer the proof of proposition 1 to the end of this section.

Let us now prove theorem 2, assuming correctness of proposition 1.

Proof of theorem 2.

Let us break down the proof into a series of claims, and then prove them one by one.

Claim 1.

The infimum in eq. 1 is achieved by some diagonal matrix $\hat{D}=\hat{D}(A)$ . In other words there exists a diagonal matrix $\hat{D}\succeq A$ such that $\operatorname{per}(\hat{D})=\operatorname{rel}(A)$ .

Claim 2.

We may assume without loss of generality that $\hat{D}=I$ .

Claim 3.

The first-order optimality condition of $\hat{D}$ implies that there exists a correlation matrix $B\in\mathbb{C}^{n\times n}$ , i.e., a hermitian PSD matrix with $1$ s on its main diagonal, such that $AB=B$ .

We may use the Cholesky decomposition to write $B=U^{\dagger}U$ where $U\in\mathbb{C}^{d\times n}$ for $d=\operatorname{rank}(B)$ .

Claim 4.

For any $x\in\mathbb{C}^{d}$ the vector $v=U^{\dagger}x/\lvert U^{\dagger}x\rvert$ satisfies

[TABLE]

Naturally we may want to choose $x$ so as to maximize $\operatorname{per}(vv^{\dagger})$ .

Claim 5.

We have

[TABLE]

where $\mathcal{U}$ is the uniform distribution on the columns of $U$ .

And now the statement of theorem 2 follows, because $\operatorname{rel}(A)=\operatorname{per}(\hat{D})=1$ when $\hat{D}=I$ ; we have found $v\in\mathbb{C}^{n}$ such that $A\succeq vv^{\dagger}$ and

[TABLE]

Let us now prove the claims one by one.

Proof of 1.

We divide the proof into two cases. First assume that $A_{ii}>0$ for all $i\in[n]$ . Let $\lambda\geq 0$ be larger than the maximum eigenvalue of $A$ . Then $\lambda I\succeq A$ . This proves that $\operatorname{rel}(A)\leq\lambda^{n}$ . Note that $D\succeq A$ implies $D_{ii}\geq A_{ii}$ for all $i\in[n]$ . If any entry $D_{ii}$ of $D$ satisfies

[TABLE]

then

[TABLE]

This effectively eliminates such a $D$ as a candidate for the $\inf$ in eq. 1. Therefore we may take $\inf$ of $\operatorname{per}(D)$ over the set of all diagonal matrices $D$ which in addition to $D\succeq A$ satisfy

[TABLE]

for all $i\in[n]$ . This is a compact set, and $\operatorname{per}(D)$ is a continuous function. Therefore the $\inf$ is achieved by some matrix $\hat{D}$ .

For the second case, assume that $A_{ii}=0$ for some $i$ . Then since $A$ is PSD, the $i$ -th row and the $i$ -th column of $A$ are both zero. Let $\lambda$ be larger than the largest eigenvalue of $A$ . Define $\hat{D}$ by $\hat{D}_{ii}=0$ and $\hat{D}_{jj}=\lambda$ for $j\neq i$ . It is easy to see that $\hat{D}\succeq A$ and $\operatorname{per}(\hat{D})=0$ . Therefore $\operatorname{rel}(A)=0$ and it is achieved at $\hat{D}$ . ∎

Proof of 2.

First note that without loss of generality we may assume $\hat{D}(A)\succ 0$ , since otherwise $\operatorname{rel}(A)=0$ and the conclusion of theorem 2 is trivial.

Now let $\lambda\in\mathbb{R}_{>0}^{n}$ be an arbitrary positive vector and define $T_{\lambda}:\mathbb{C}^{n\times n}\to\mathbb{C}^{n\times n}$ by

[TABLE]

Note that $T_{\lambda}$ respects the Loewner order and maps diagonal matrices to diagonal matrices. It is one-to-one and surjective on the space of diagonal matrices. The matrix $T_{\lambda}(M)$ is obtained from $M$ by multiplying column $i$ by $\lambda_{i}$ for $i\in[n]$ and then row $i$ by $\lambda_{i}$ for $i\in[n]$ . Therefore

[TABLE]

This implies that

[TABLE]

It is also easy to see that the above also implies $\hat{D}(T_{\lambda}(A))=T_{\lambda}(\hat{D}(A))$ . In particular if $\lambda$ is set so that $\lambda_{i}=1/\sqrt{\hat{D}_{ii}}$ , then $\hat{D}(T_{\lambda}(A))=I$ . So we can replace $A$ by $T_{\lambda}(A)$ and continue the proof of theorem 2 to find $v\in\mathbb{C}^{n}$ satisfying

[TABLE]

and $c^{n}\operatorname{per}(vv^{\dagger})\geq\operatorname{rel}(T_{\lambda}(A))=1$ with $c=e^{\gamma+1}$ . Let $w=\operatorname{diag}(\lambda)^{-1}v$ . Then $T_{\lambda}(ww^{\dagger})=vv^{\dagger}$ . This implies that

[TABLE]

and

[TABLE]

∎

Proof of 3.

We use the first-order optimality condition of $\operatorname{per}(D)$ at $D=I$ . Let us change $I$ to $I+X$ where $X$ is a diagonal matrix. Then if $X$ is small enough $\operatorname{per}(I+X)\simeq 1+\operatorname{tr}(X)$ . More precisely, we have

[TABLE]

If $I+X\succeq D$ then $I+tX\succeq D$ for all $t\in[0,1]$ . If $\operatorname{tr}(X)<0$ , then for small enough $t$ , $\operatorname{per}(I+tX)<\operatorname{per}(I)$ which contradicts the fact that $\hat{D}(A)=I$ . This implies that the optimal solution of the following SDP is [math]:

[TABLE]

The dual of this SDP has variables $B\succeq 0$ , corresponding to the constraint $I+X\succeq A$ , and $\mu_{ij}$ for $i\neq j$ , corresponding to the constraint $X_{ij}=0$ :

[TABLE]

Because of strong duality, the optimum of this SDP is [math]. The optimal $B$ satisfies $B\succeq 0$ and $B_{ii}=1$ for $i\in[n]$ , i.e., $B$ is a correlation matrix. We also have $\operatorname{tr}((I-A)B)=0$ . But since $I-A\succeq 0$ and $B\succeq 0$ , this implies that $(I-A)B=0$ or in other words $AB=B$ . ∎

Proof of 4.

We have $B=U^{\dagger}U$ with $U\in\mathbb{C}^{d\times n}$ and $\operatorname{rank}(B)=d$ . This implies that $UU^{\dagger}\in\mathbb{C}^{d\times d}$ is invertible. Now we have

[TABLE]

This together with $AB=B$ implies that

[TABLE]

In other words, $U^{\dagger}x$ is an eigenvector of $A$ with eigenvalue $1$ . This means that $v=U^{\dagger}x/\lvert U^{\dagger}x\rvert$ is also such an eigenvector. So $Av=v$ and $\lvert v\rvert=1$ . We conclude that $A\succeq vv^{\dagger}$ . ∎

Proof of 5.

Let us compute $\operatorname{per}(vv^{\dagger})$ . By 5 we have

[TABLE]

Let the columns of $U$ be $u_{1},\dots,u_{n}\in\mathbb{C}^{d}$ . Then $v_{i}=u_{i}^{\dagger}x/\lvert U^{\dagger}x\rvert$ , and note that $\lvert U^{\dagger}x\rvert^{2}=\sum_{i=1}^{n}\lvert u_{i}^{\dagger}x\rvert^{2}$ . We can rewrite $\operatorname{per}(vv^{\dagger})$ as

[TABLE]

Now if we let $\mathcal{U}$ be the uniform distribution on $u_{1},\dots,u_{n}$ , we can rewrite the above as

[TABLE]

Therefore

[TABLE]

∎

This concludes the proof of theorem 2. ∎

It only remains to prove proposition 1.

Proof of proposition 1.

Without loss of generality we may assume that $\operatorname{span}(\mathcal{U})=\mathbb{C}^{d}$ ; if that is not the case, we can identify $\operatorname{span}(\mathcal{U})$ with $\mathbb{C}^{d^{\prime}}$ for some $d^{\prime}<d$ using a unitary transformation and nothing changes.

Let $x\sim\mathbb{C}\mathcal{N}(0,I)$ be a $d$ -dimensional standard complex normal. Let

[TABLE]

Then our goal is to prove that $\mathbb{P}_{x}[g(x)/h(x)\geq e^{-\gamma}]>0$ or equivalently $\mathbb{P}_{x}[g(x)-e^{-\gamma}h(x)\geq 0]>0$ . To this end, we will prove that $\mathbb{E}_{x}[g(x)-e^{-\gamma}h(x)]\geq 0$ , and the conclusion follows.

By 4, for each fixed $u$ in the support of $\mathcal{U}$ , $u^{\dagger}x\sim\mathbb{C}\mathcal{N}(0,1)$ . Therefore we have

[TABLE]

On the other hand by 3 we have

[TABLE]

where the inequality is an application of Jensen’s to the convex function $\exp$ . Putting these together we get that $\mathbb{E}_{x}[g(x)-e^{-\gamma}h(x)]\geq e^{-\gamma}-e^{-\gamma}=0$ as desired. ∎

3.2 Computing the Approximation

In this section we show how to approximately compute $\operatorname{rel}(A)$ . The main result of this section will be theorem 3.

The main ingredient of the proof is transforming $\operatorname{rel}(D)$ to the objective of a convex program. The original optimization problem that computes $\operatorname{rel}(D)$ is the following:

[TABLE]

The objective is not concave, even if we apply $\ln$ to it. The trick is to change from the variables $D_{11},\dots,D_{nn}$ to $D_{11}^{-1},\dots,D_{nn}^{-1}$ . If we have the Cholesky decomposition $A=V^{\dagger}V$ for some $V\in\mathbb{C}^{d\times n}$ , then $D\succeq A$ if and only if

[TABLE]

So we can turn the optimization problem into the following by identifying $D^{-1}$ with $\operatorname{diag}(x)$ .

[TABLE]

If the objective of the above program is $\operatorname{OPT}$ , then $\operatorname{rel}(A)=e^{\operatorname{OPT}}$ . Note that $-\ln(x_{1}\dots x_{n})$ is convex over $\mathbb{R}_{\geq 0}^{n}$ , so the above is a valid convex program.

Proof of theorem 3.

We can detect whether $\operatorname{rel}(A)=0$ by checking whether any of $A$ ’s main diagonal entries are [math]. See the proof of 1.

When all of the main diagonal entries of $A$ are strictly positive, similar to the proof of 1, we can determine upper and lower bounds on the optimum $x_{i}$ . In particular if $\lambda$ is a number larger than the largest eigenvalue of $A$ , for the optimum $x_{i}$ we have

[TABLE]

Thus, we can restrict the domain of the convex program in eq. 4 to a compact bounded domain. We can compute the Cholesky decomposition of $A$ and then use our favorite convex programming technique, such as the ellipsoid method, to find the optimum value of eq. 4 to within accuracy $\epsilon$ in time $\operatorname{poly}(n+\langle A\rangle+\log(1/\epsilon))$ . This gives us a $1+\epsilon$ approximation of $\operatorname{rel}(A)$ which by eq. 2 is a $(1+\epsilon)c^{n}$ approximation of $\operatorname{per}(A)$ for $c=e^{\gamma+1}$ .

As a final remark, we note that the approximation factor $e^{n(\gamma+1)}$ in eq. 2 can in fact be slightly strengthened to

[TABLE]

if one carefully reviews the proof. The term $n^{n}/n!$ is at most $e^{n}$ , but the difference allows us to absorb $1+\epsilon$ into the approximation factor for an appropriately chosen $\epsilon$ . This allows us to state an $\epsilon$ -free result: We can find an $e^{n(\gamma+1)}$ approximation to $\operatorname{per}(A)$ in time $\operatorname{poly}(n+\langle A\rangle)$ . ∎

4 Asymptotically Tight Examples

In this section we show that the constant $c=e^{\gamma+1}$ cannot be replaced by anything smaller in eq. 2. In other words we will construct $n\times n$ hermitian PSD matrices $A$ such that

[TABLE]

The construction will begin with a distribution $\mathcal{U}$ that is uniform over $n$ unit vectors $u_{1},\dots,u_{n}\in\mathbb{C}^{d}$ . We will later show how we can construct $\mathcal{U}$ so that $f(\mathcal{U})$ is arbitrarily close to $e^{-\gamma}$ .

Lemma 3.

For any $\epsilon>0$ there exists a distribution $\mathcal{U}$ that is uniform over $n$ unit vectors $u_{1},\dots,u_{n}\in\mathbb{C}^{d}$ for some $n$ and $d$ that satisfies

[TABLE]

We postpone the proof of lemma 3 to the end of this section. For now we use it to show the following. The following proposition together with lemma 3 show that $e^{\gamma+1}$ cannot be improved in eq. 2.

Proposition 2.

Given a distribution $\mathcal{U}$ that is uniform over a finite number of unit vectors $u_{1},\dots,u_{n}$ , we can construct a sequence of matrices $A_{1},A_{2},\dots$ of sizes $n_{1}\times n_{1},n_{2}\times n_{2},\dots$ such that

[TABLE]

Proof.

Our goal is to construct a PSD matrix $A$ and relate $\operatorname{rel}(A)/\operatorname{per}(A)$ to $f(\mathcal{U})$ . We will assume without loss of generality that $\operatorname{span}\{u_{1},\dots,u_{n}\}=\mathbb{C}^{d}$ ; otherwise, we use a unitary transformation to map $u_{1},\dots,u_{n}$ onto a lower dimensional space and $f(\mathcal{U})$ would not change.

Consider the matrix $U\in\mathbb{C}^{d\times n}$ whose columns are $u_{1},\dots,u_{n}$ . Note that $\operatorname{rank}(U)=d$ and $U^{\dagger}U\succeq 0$ has $1$ s on the main diaognal. In other words $U^{\dagger}U$ is a correlation matrix of rank $d$ . Since $\operatorname{rank}(U)=d$ , the matrix $UU^{\dagger}$ is invertible and we can define

[TABLE]

and

[TABLE]

We will study $\operatorname{rel}(A)$ and $\operatorname{per}(A)$ and relate them to $f(\mathcal{U})$ .

As observed in the proof of 3, correlation matrices can be used as optimality certificates for $\operatorname{rel}$ , albeit in that context first order optimality was just a necessary condition. We now make a formal claim by certifying that $\operatorname{rel}(A)=1$ using $U^{\dagger}U$ as the certificate.

Claim 6.

If $A$ is constructed as above, then

[TABLE]

Proof.

We clearly have $I\succeq U^{\dagger}(UU^{\dagger})^{-1}U=V^{\dagger}V$ . This implies that $\operatorname{rel}(A)\leq 1$ . Now consider a diagonal matrix $D\succeq A=V^{\dagger}V$ . We need to show that $\operatorname{per}(D)\geq 1$ . Without loss of generality, by adding a small multiple of $I$ if necessary, we may assume that $D\succ 0$ . Now $D\succeq V^{\dagger}V$ implies that

[TABLE]

which in turn implies

[TABLE]

By taking the trace we get

[TABLE]

Since $U^{\dagger}U$ has $1$ s on the diagonal and $D$ is diagonal the above becomes

[TABLE]

By using the AM-GM inequality we get

[TABLE]

This means that $\operatorname{per}(D)=D_{11}\dots D_{nn}\geq 1$ . ∎

Next we study $\operatorname{per}(A)$ . This is where the term $f(\mathcal{U})$ appears.

Claim 7.

If $A$ is constructed as above, then

[TABLE]

Before proving 7, let us show why it suffices to finish the proof of proposition 2. By 6 and 7 we have

[TABLE]

This is not quite the same as $ef(\mathcal{U})^{-1}$ yet. However we have one degree of freedom we have not used. Initially we assumed $\mathcal{U}$ was a uniform distribution over $n$ unit vectors. But we might have as well assumed that it is a uniform distribution over $nk$ unit vectors for any integer $k$ , by simply repeating the vectors in the support of $\mathcal{U}$ . Therefore we may make $n$ as large as we would like without changing $d$ or $f(\mathcal{U})$ . As $n\to\infty$ , by Stirling’s formula we have

[TABLE]

and by a simple bound for large enough $n$

[TABLE]

Therefore as $n\to\infty$ we have

[TABLE]

It only remains to prove 7.

Proof of 7.

We will use lemma 2 to write down $\operatorname{per}(A)=\operatorname{per}(V^{\dagger}V)$ . Let $x\in\mathbb{C}^{d}$ be distributed according to a $d$ -dimensional standard complex normal $\mathbb{C}\mathcal{N}(0,I)$ . Then according to lemma 2 we have

[TABLE]

Our goal is to use $f(\mathcal{U})$ to bound $\lvert V^{\dagger}x\rvert_{\Pi}$ . According to the definition of $f(\mathcal{U})$ , for the vector $y=(UU^{\dagger})^{-1/2}x$ we have

[TABLE]

Note that

[TABLE]

This means that $\prod_{i=1}^{n}\lvert u_{i}^{\dagger}y\rvert^{2}=\lvert V^{\dagger}x\rvert_{\Pi}^{2}$ . We also have

[TABLE]

Putting these together we get

[TABLE]

Let us now compute $\mathbb{E}_{x}[\lvert x\rvert^{2n}]$ . We have

[TABLE]

According to 2, we have $\mathbb{E}_{x}[\lvert x_{1}\rvert^{2k_{1}}\dots\lvert x_{d}\rvert^{2k_{d}}]=k_{1}!\dots k_{d}!$ . Therefore

[TABLE]

where in the last equality we used the fact the number of ways to write $n$ as a sum of $d$ nonnegative integers is $\binom{n+d-1}{d-1}$ . We conclude by getting

[TABLE]

∎

This finishes the proof of proposition 2. ∎

Now we switch gears and construct the distribution $\mathcal{U}$ promised by lemma 3.

Proof of lemma 3.

The idea is to make $\mathcal{U}$ be close to the uniform distribution on the sphere $\{u\in\mathbb{C}^{d}:\lvert u\rvert=1\}$ for some large $d$ . If we were allowed to pick $\mathcal{U}$ to be uniform over the sphere, then intuitively all choices of $x$ in the definition of $f(\mathcal{U})$ would yield the same value and we would be able to argue about this common value using the same tricks as in the proof of proposition 1. Instead we use the uniform distribution on a large number of samples from the sphere to serve as the proxy for the uniform distribution on the sphere itself. We further need the dimension $d$ to grow, to make the uniform distribution on the sphere similar to a (scaled) normal distribution. We now make these formal.

Let us fix some $d$ and let $\mathcal{S}$ denote the uniform distribution on the sphere $\{u\in\mathbb{C}^{d}:\lvert u\rvert=1\}$ . For any fixed distance $\epsilon$ we can cover the sphere by a finite number of balls $B(o_{1},\epsilon),\dots,B(o_{m},\epsilon)$ where $o_{1},\dots,o_{m}$ are unit vectors and

[TABLE]

Let $n$ be a large number and draw $n$ random points $u_{1},\dots,u_{n}$ from $\mathcal{S}$ . We will let $\mathcal{U}$ be the uniform distribution over $u_{1},\dots,u_{n}$ . We would like to argue that $f(\mathcal{U})$ is with high probability close to $f(\mathcal{S})$ . Because the sphere was covered by the balls around $o_{i}$ ’s, for each unit vector $x$ we have $\lvert x-o_{i}\rvert\leq\epsilon$ for some $i$ . This implies that

[TABLE]

On the other hand by the law of large numbers for each $o_{i}$ we have with high probability as $n\to\infty$

[TABLE]

Let us condition on the event that the LHS of the above are sufficiently close to the RHS for all $o_{i}$ . This event happens with high probability as $n\to\infty$ . Note that because of symmetry, the RHS of the above are independent of the choice of $o_{i}$ . Under this condition we have for all unit vectors $x$

[TABLE]

where $o$ is any arbitrary vector and $\delta\to 0$ as $n\to\infty$ . The above bounds the LHS for unit vectors $x$ . However note that the LHS does not change if we scale $x$ by any constant. Therefore $f(\mathcal{U})$ is bounded by the RHS. As we take the limit with $\epsilon\to 0$ and $\delta\to 0$ we get $\mathcal{U}$ with $f(\mathcal{U})$ asymptotically bounded by $f(\mathcal{S})$ .

Now it only remains to show that as the dimension $d$ grows $f(\mathcal{S})\to e^{-\gamma}$ . Let $o$ be an arbitrary point with $\lvert o\rvert^{2}=d$ such as $\sqrt{d}e_{1}$ where $e_{1}$ is the first element of the standard basis. When $u\sim\mathcal{S}$ is a random point on the sphere, we would like to argue that $u^{\dagger}o$ is almost distributed like $\mathbb{C}\mathcal{N}(0,1)$ . If this were the case we would have

[TABLE]

where in the last equality we used 3.

To make this approximation rigorous, let us generate the random point $u$ on the sphere by the following process: We sample a standard $d$ -dimensional complex normal $v\sim\mathbb{C}\mathcal{N}(0,I)$ and then we let $u=v/\lvert v\rvert$ . We have $u^{\dagger}o=v_{1}\frac{d}{\lvert v\rvert}$ . Therefore

[TABLE]

The random variable $\lvert v\rvert^{2}$ is distributed according to a $\frac{1}{2}$ -scaled $\chi^{2}$ -distribution with $2d$ degrees of freedom which is the same as $\Gamma(d,1)$ . We can therefore write

[TABLE]

where $\psi$ is the digamma function [Cha93, AS64]. We therefore have $\mathbb{E}_{u}[\ln(\lvert u^{\dagger}o\rvert^{2})]=-\gamma+o(1)$ .

For $\mathbb{E}_{u}[\lvert u^{\dagger}o\rvert^{2}]$ we observe that

[TABLE]

The random variables $\lvert v_{i}\rvert^{2}/\lvert v\rvert^{2}$ are identically distributed for different $i$ . As such we have

[TABLE]

Therefore

[TABLE]

This shows that $f(\mathcal{S})\to e^{-\gamma}$ as $d\to\infty$ and concludes the proof. ∎

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Aar] S. Aaronson See point (4) of Comment #84 at http://www.scottaaronson.com/blog/?p=2408
2[Aar 11] Scott Aaronson “A linear-optical proof that the permanent is# P-hard” In Proc. R. Soc. A 467.2136 , 2011, pp. 3393–3405 The Royal Society
3[AS 64] Milton Abramowitz and Irene A Stegun “Handbook of mathematical functions: with formulas, graphs, and mathematical tables” Courier Corporation, 1964
4[Bap 07] RB Bapat “Recent Developments and Open Problems in the Theory of Permanents” In The Mathematics student 76.1 , 2007, pp. 55
5[CCG 16] L Chakhmakhchyan, NJ Cerf and R Garcia-Patron “A quantum-inspired algorithm for estimating the permanent of positive semidefinite matrices” In ar Xiv preprint ar Xiv:1609.02416 , 2016
6[Cha 93] Shing Ping Chan “A statistical study of log-gamma distribution”, 1993
7[GS 16] Daniel Grier and Luke Schaeffer “New Hardness Results for the Permanent Using Linear Optics” Electronic Colloquium on Computational Complexity (ECCC), 2016 URL: http://eccc.hpi-web.de/report/2016/159
8[Gur 03] Leonid Gurvits “Classical deterministic complexity of Edmonds’ problem and quantum entanglement” In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing , 2003, pp. 10–19 ACM

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Simply Exponential Approximation of the Permanent of Positive Semidefinite Matrices

Abstract

1 Introduction

Theorem 1**.**

Definition 1**.**

2 Preliminaries

2.1 Linear Algebra

Fact 1**.**

2.2 Standard Complex Normal Distribution

Fact 2**.**

Proof.

Fact 3**.**

Proof.

Fact 4**.**

Proof.

2.3 Permanent and Loewner Order

Lemma 1**.**

Proof.

Corollary 1**.**

Proof.

Corollary 2**.**

Proof.

Lemma 2**.**

Proof of lemma 2.

3 Approximation of Permanent on the PSD Cone

Theorem 2**.**

Fact 5**.**

Proof.

Theorem 3**.**

3.1 Proof of the Main Result

Definition 2**.**

Proposition 1**.**

Proof of theorem 2.

Claim 1**.**

Claim 2**.**

Claim 3**.**

Claim 4**.**

Claim 5**.**

Proof of 1.

Proof of 2.

Proof of 3.

Proof of 4.

Proof of 5.

Proof of proposition 1.

3.2 Computing the Approximation

Proof of theorem 3.

4 Asymptotically Tight Examples

Lemma 3**.**

Proposition 2**.**

Proof.

Claim 6**.**

Proof.

Claim 7**.**

Proof of 7.

Proof of lemma 3.

Theorem 1.

Definition 1.

Fact 1.

Fact 2.

Fact 3.

Fact 4.

Lemma 1.

Corollary 1.

Corollary 2.

Lemma 2.

Theorem 2.

Fact 5.

Theorem 3.

Definition 2.

Proposition 1.

Claim 1.

Claim 2.

Claim 3.

Claim 4.

Claim 5.

Lemma 3.

Proposition 2.

Claim 6.

Claim 7.