Gaussian maximizers for quantum Gaussian observables and ensembles

A. S. Holevo

arXiv:1908.03038·quant-ph·August 31, 2020·IEEE Trans. Inf. Theory

Gaussian maximizers for quantum Gaussian observables and ensembles

A. S. Holevo

PDF

TL;DR

This paper proves that for multimode bosonic systems with gauge symmetry, the classical capacity of Gaussian observables and the accessible information of Gaussian ensembles are achieved by Gaussian states and measurements, extending known single-mode results.

Contribution

It establishes that Gaussian ensembles optimize classical capacity and accessible information in multimode bosonic systems, generalizing single-mode findings.

Findings

01

Classical capacity of Gaussian observables is attained on Gaussian ensembles.

02

Accessible information of Gaussian ensembles is achieved by multimode heterodyne measurement.

03

Results extend single-mode Gaussian optimization to multimode systems.

Abstract

In this paper we prove two results related to the Gaussian optimizers conjecture for multimode bosonic system with gauge symmetry. First, we argue that the classical capacity of a Gaussian observable is attained on a Gaussian ensemble of coherent states. This generalizes results previously known for heterodyne measurement in one mode. By using this fact and continuous variable version of ensemble-observable duality, we prove an old conjecture that accessible information of a Gaussian ensemble is attained on the multimode generalization of the heterodyne measurement.

Equations207

M : ρ \to p_{ρ}

M : ρ \to p_{ρ}

\overset{ρ}{ˉ}_{E} = \int_{X} ρ_{x} π (d x),

\overset{ρ}{ˉ}_{E} = \int_{X} ρ_{x} π (d x),

P (A \times B) = \int_{A} π (d x) Tr ρ_{x} M (B) = Tr \int_{A} \int_{B} p_{ρ_{x}} (y) π (d x) μ (d y),

P (A \times B) = \int_{A} π (d x) Tr ρ_{x} M (B) = Tr \int_{A} \int_{B} p_{ρ_{x}} (y) π (d x) μ (d y),

I (E, M) = \int\int π (d x) μ (d y) p_{ρ_{x}} (y) lo g \frac{p _{ρ_{x}} ( y )}{p _{\overset{ρ}{ˉ}_{E}} ( y )} .

I (E, M) = \int\int π (d x) μ (d y) p_{ρ_{x}} (y) lo g \frac{p _{ρ_{x}} ( y )}{p _{\overset{ρ}{ˉ}_{E}} ( y )} .

h (p) = - \int p (x) lo g p (x) μ (d x)

h (p) = - \int p (x) lo g p (x) μ (d x)

h (p) = h (\tilde{p}) + d lo g c

h (p) = h (\tilde{p}) + d lo g c

I (E, M) = h (p_{\overset{ρ}{ˉ}_{E}}) - \int h (p_{ρ_{x}}) π (d x) .

I (E, M) = h (p_{\overset{ρ}{ˉ}_{E}}) - \int h (p_{ρ_{x}}) π (d x) .

z=\left[\begin{array}[]{c}z_{1}\\ \dots\\ z_{s}\end{array}\right],\quad z^{\ast}=\left[\begin{array}[]{ccc}\bar{z}_{1}&\dots&\bar{z}_{s}\end{array}\right].

z=\left[\begin{array}[]{c}z_{1}\\ \dots\\ z_{s}\end{array}\right],\quad z^{\ast}=\left[\begin{array}[]{ccc}\bar{z}_{1}&\dots&\bar{z}_{s}\end{array}\right].

\tilde{f} (w) = \int exp (z^{*} w - w^{*} z) f (z) \frac{d ^{2 s} z}{π ^{s}} = \int exp (2 i Im z^{*} w) f (z) \frac{d ^{2 s} z}{π ^{s}} .

\tilde{f} (w) = \int exp (z^{*} w - w^{*} z) f (z) \frac{d ^{2 s} z}{π ^{s}} = \int exp (2 i Im z^{*} w) f (z) \frac{d ^{2 s} z}{π ^{s}} .

a=\left[\begin{array}[]{c}a_{1}\\ \dots\\ a_{s}\end{array}\right],\quad a^{\dagger}=\left[\begin{array}[]{ccc}a_{1}^{\dagger}&\dots&a_{s}^{\dagger}\end{array}\right],

a=\left[\begin{array}[]{c}a_{1}\\ \dots\\ a_{s}\end{array}\right],\quad a^{\dagger}=\left[\begin{array}[]{ccc}a_{1}^{\dagger}&\dots&a_{s}^{\dagger}\end{array}\right],

a_{j} a_{k}^{†} - a_{k}^{†} a_{j} = δ_{j k} I,

a_{j} a_{k}^{†} - a_{k}^{†} a_{j} = δ_{j k} I,

D (z) D (w) = exp (- i Im z^{*} w) D (z + w), z, w \in C^{s} .

D (z) D (w) = exp (- i Im z^{*} w) D (z + w), z, w \in C^{s} .

U_{φ}^{*} a_{j} U_{φ} = a_{j} e^{- i φ}, U_{φ}^{*} a_{j}^{†} U_{φ} = a_{j}^{†} e^{i φ} .

U_{φ}^{*} a_{j} U_{φ} = a_{j} e^{- i φ}, U_{φ}^{*} a_{j}^{†} U_{φ} = a_{j}^{†} e^{i φ} .

Tr ρ D (w)

Tr ρ D (w)

Tr ρ σ^{*} = \int Tr ρ D (w) \overline{Tr σ D (w)} \frac{d ^{2 s} w}{π ^{s}}

Tr ρ σ^{*} = \int Tr ρ D (w) \overline{Tr σ D (w)} \frac{d ^{2 s} w}{π ^{s}}

Tr ρ_{Λ} D (w) = exp [- w^{*} (Λ + \frac{I _{s}}{2}) w],

Tr ρ_{Λ} D (w) = exp [- w^{*} (Λ + \frac{I _{s}}{2}) w],

ρ_{Λ} = \int ∣ z ⟩ ⟨ z ∣ exp (- z^{*} Λ^{- 1} z) \frac{d ^{2 s} z}{π ^{s} det Λ},

ρ_{Λ} = \int ∣ z ⟩ ⟨ z ∣ exp (- z^{*} Λ^{- 1} z) \frac{d ^{2 s} z}{π ^{s} det Λ},

ρ_{Λ, z} = D (z) ρ_{Λ} D (z)^{†} = \int ∣ w ⟩ ⟨ w ∣ exp (- (w - z)^{*} Λ^{- 1} (w - z)) \frac{d ^{2 s} w}{π ^{s} det Λ},

ρ_{Λ, z} = D (z) ρ_{Λ} D (z)^{†} = \int ∣ w ⟩ ⟨ w ∣ exp (- (w - z)^{*} Λ^{- 1} (w - z)) \frac{d ^{2 s} w}{π ^{s} det Λ},

Tr ρ_{Λ, z} D (w) = exp [2 i Im z^{*} w - w^{*} (Λ + \frac{I _{s}}{2}) w] .

Tr ρ_{Λ, z} D (w) = exp [2 i Im z^{*} w - w^{*} (Λ + \frac{I _{s}}{2}) w] .

\tilde{M} (d^{2 s} z) = D (K z) ρ_{N} D (K z)^{†} \frac{∣ det K ∣ ^{2} d ^{2 s} z}{π ^{s}}; z \in C^{s},

\tilde{M} (d^{2 s} z) = D (K z) ρ_{N} D (K z)^{†} \frac{∣ det K ∣ ^{2} d ^{2 s} z}{π ^{s}}; z \in C^{s},

M (d^{2 s} z) = D (z) ρ_{N} D (z)^{†} \frac{d ^{2 s} z}{π ^{s}}; z \in C^{s} .

M (d^{2 s} z) = D (z) ρ_{N} D (z)^{†} \frac{d ^{2 s} z}{π ^{s}}; z \in C^{s} .

M_{*} (d^{2 s} z) = D (z) ρ_{0} D (z)^{†} \frac{d ^{2 s} z}{π ^{s}} = ∣ z ⟩ ⟨ z ∣ \frac{d ^{2 s} z}{π ^{s}},

M_{*} (d^{2 s} z) = D (z) ρ_{0} D (z)^{†} \frac{d ^{2 s} z}{π ^{s}} = ∣ z ⟩ ⟨ z ∣ \frac{d ^{2 s} z}{π ^{s}},

p_{ρ} (z)

p_{ρ} (z)

h (\tilde{p}_{ρ}) = h (p_{ρ}) - 2 lo g ∣ det K ∣ .

h (\tilde{p}_{ρ}) = h (p_{ρ}) - 2 lo g ∣ det K ∣ .

Tr a ρ a^{†} = Σ,

Tr a ρ a^{†} = Σ,

Tr ρ_{Σ} D (z) ρ_{N} D (z)^{†} = \frac{1}{π ^{s} det ( Σ + N + I _{s} )} exp [- z^{*} (Σ + N + I_{s})^{- 1} z],

Tr ρ_{Σ} D (z) ρ_{N} D (z)^{†} = \frac{1}{π ^{s} det ( Σ + N + I _{s} )} exp [- z^{*} (Σ + N + I_{s})^{- 1} z],

h (p_{ρ}) \leq h (p_{ρ_{Σ}}), ρ \in S (Σ) .

h (p_{ρ}) \leq h (p_{ρ_{Σ}}), ρ \in S (Σ) .

ρ_{g i} = 0 \int 2 π U_{φ}^{*} ρ U_{φ} \frac{d φ}{2 π} .

ρ_{g i} = 0 \int 2 π U_{φ}^{*} ρ U_{φ} \frac{d φ}{2 π} .

C_{χ} (M; Σ) = E : \overset{ρ}{ˉ}_{E} \in S (Σ) sup I (E, M),

C_{χ} (M; Σ) = E : \overset{ρ}{ˉ}_{E} \in S (Σ) sup I (E, M),

C_{χ} (M; Σ) = lo g det (I_{s} + (N + I_{s})^{- 1} Σ)

C_{χ} (M; Σ) = lo g det (I_{s} + (N + I_{s})^{- 1} Σ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Gaussian maximizers for quantum Gaussian observables and ensembles

A. S. Holevo

Steklov Mathematical Institute

Gubkina 8, 119991 Moscow, Russia

Abstract

In this paper we prove two results related to the Gaussian optimizers conjecture for multimode bosonic system with gauge symmetry. First, we argue that the classical capacity of a Gaussian observable is attained on a Gaussian ensemble of coherent states. This generalizes results previously known for heterodyne measurement in one mode. By using this fact and continuous variable version of ensemble-observable duality, we prove an old conjecture that accessible information of a Gaussian ensemble is attained on the multimode generalization of the heterodyne measurement.

1 Introduction

In this paper we prove two results related to the Gaussian optimizers conjecture for multimode bosonic system with global gauge symmetry111In quantum communication literature such systems are called phase-insensitive.. In theorem 1 of sec. 3 we argue that the classical capacity of an arbitrary gauge-covariant Gaussian observable – considered as a communication channel with quantum input and classical output – is attained on a Gaussian ensemble of coherent states. This generalizes result previously known for the heterodyne measurement [1]. In the difficult part of the argument – the minimization of the output differential entropy – we rely upon our previous result [2] obtained as a limiting case of the general solution of the Gaussian optimizers conjecture for quantum Gaussian channels [3]. Let us stress that it is not possible to apply that solution directly to a Gaussian observable because there is no way to embed a continuously-valued observable (as distinct from discretely-valued observables) into a channel with quantum output [4]. The classical capacity of observable is the most important quantity characterizing the ultimate information-processing performance of the measurement (see e.g. [1], [5], [4]).

By using theorem 1 and infinite-dimensional version of ensemble-observable duality developed in sec. 4, we prove the main result of this work – theorem 2 concerning accessible information of a Gaussian ensemble. In particular, it answers an old conjecture [6], [7], [8] that the accessible information of a Gaussian ensemble is attained by the multimode generalization of the heterodyne measurement. As in the other Gaussian optimizer problems, the difficulty here lies in finding the *global maximum * of a convex functional, when the optimal solution turns out to be highly non-unique and the standard tools of convex analysis become inefficient.

2 Preliminaries

Let $\mathcal{M}=\{M(dx)\}$ be an observable (POVM) in a separable Hilbert space $\mathcal{H}$ with the outcome space $\mathcal{X}$ which is a complete separable metric space. A corresponding measurement channel is defined as transformation $\mathcal{M}:\rho\longrightarrow\mathrm{Tr}\rho M(dx)$ of density operators (d.o.) $\rho$ to probability distributions on $\mathcal{X}$ . In [9] the existence of a $\sigma-$ finite measure $\mu(dx)$ was shown such that for any d.o. $\rho$ the probability measure $\mathrm{Tr}\rho M(dx)$ is absolutely continuous w.r.t. $\mu(dx),$ thus having the probability density (p.d.) $p_{\rho}(x).$ Therefore the * measurement channel* can be defined as the transformation

[TABLE]

mapping affinely d.o. on $\mathcal{H}$ into p.d. on $\left(\mathcal{X},\mu\right)$ . Notice that $\mu(dx)$ is defined uniquely only up to the class of mutually absolutely continuous measures.

A (generalized) ensemble $\mathcal{E}=\left\{\pi(dx),\rho_{x}\right\}$ consists of probability measure $\pi(dx)$ on the input space $\mathcal{X}$ and a measurable family of d.o. $\rho_{x}$ on $\mathcal{H}$ . The average state of the ensemble is the barycenter of this measure

[TABLE]

the integral existing in the strong sense in the Banach space of trace class operators. Let $\mathcal{M}=\{M(dy)\}$ be an observable with the outcome space $\mathcal{Y}$ and $\rho\rightarrow p_{\rho}$ the corresponding measurement channel. The joint probability distribution of $x,y$ on $\mathcal{X\times Y}$ is uniquely defined by the relation

[TABLE]

where $A$ is an arbitrary Borel subset of $\mathcal{X}$ and $B$ is that of $\mathcal{Y}.$

The classical Shannon information between $x,y$ is equal to (cf. [10])

[TABLE]

We will use the differential entropy

[TABLE]

of a p.d. $p(x).$ There is a special class $\mathcal{D}$ of p.d.’s we will be using for which the differential entropy is well-defined. Let $\mathcal{X}$ be a $d$ -dimensional vector space and $p(x)$ a bounded p.d. on $\mathcal{X}$ such that $p(x)\leq c^{d}$ (mod $\mu$ ) for some $c>0$ . Then $h(p)$ is well-defined with values in $[d\log c,+\infty]$ because in this case $\tilde{p}(x)=p(x/c)c^{-d}$ is a p.d. satisfying $\tilde{p}(x)\leq 1$ (mod $\mu$ ), hence $-\tilde{p}(x)\,\log\,\tilde{p}(x)\,\geq 0$ . Thus $h(\tilde{p})$ is well-defined with values in $[0,+\infty]$ and by change of variable $\tilde{x}=cx$ ,

[TABLE]

is also well-defined with values in $[d\log c,+\infty]$ .

If observable $\mathcal{M}$ is such that $p_{\rho}\in\mathcal{D}$ for any d.o. $\rho$ and ensemble $\mathcal{E}$ is such that $h(p_{\bar{\rho}_{\mathcal{E}}})<\infty$ , then the Shannon information between $x,y$ is equal to

[TABLE]

This quantity is well-defined with values in $[0,+\infty]$ due to Jensen’s inequality.

In what follows $\mathcal{H}$ will be the space of a strongly continuous irreducible projective unitary representation $z\rightarrow D(z)$ of the canonical commutation relations (CCR) (see e.g. [11], [12] for a detailed account) describing quantization of a linear classical system with $s$ degrees of freedom such as finite number of physically relevant electromagnetic modes in a receiver’s cavity.

The classical linear system with the preferred complex structure (gauge) is described by the phase space $\mathbb{C}^{s}$ equipped with the symplectic form $2\mathrm{Im\,\ }z^{\ast}w,$ where

[TABLE]

We will use the symplectic Fourier transform

[TABLE]

Notice that $\tilde{\tilde{f}}=f$ i.e. inverse transform has the same form.

The quantization gives a bosonic system described by the collection of annihilation-creation operators, in the vector form

[TABLE]

where the lower index of a component refers to the number of the mode. The CCR including the nonvanishing commutator

[TABLE]

are conveniently written in terms of displacement operators $D(z)=\exp\left(a^{\dagger}z-z^{\ast}a\right),$ namely

[TABLE]

The (global) gauge group acts as $z\rightarrow e^{i\varphi}z,$ ( $\varphi$ is real phase) in the classical space, and via the unitary group $\varphi\rightarrow U_{\varphi}=\exp\left(-i\varphi\,a^{\dagger}a\right)$ in $\mathcal{H}$ ( $a^{\dagger}a$ is the total number operator), so that

[TABLE]

The quantum Fourier transform of a trace class operator $\rho$ is defined as

[TABLE]

The quantum Parceval formula holds:

[TABLE]

An operator $\rho$ is gauge-invariant if $U_{\varphi}^{\ast}\,\rho U_{\varphi}=\rho$ for all values of the phase $\varphi$ . A gauge-invariant Gaussian d.o. is defined by the quantum characteristic function222We denote $I_{s}$ the unit $s\times s$ -matrix, as distinct from the unit operator $I$ in a Hilbert space.

[TABLE]

where $\Lambda=\mathrm{Tr}\,a\rho_{\Lambda}a^{\dagger}$ is the complex covariance matrix satisfying $\Lambda\geq 0$ . Notice that $\rho_{\Lambda}^{\top}=\rho_{\bar{\Lambda}},$ where ⊤ denotes the transposition operation defined in (54) (see Appendix A).

In the Hilbert space $\mathcal{H}$ of an irreducible representation of CCR there is a unique unit vacuum vector $|0\rangle$ such that $a|0\rangle=0.$ The case $\Lambda=0$ in (6) corresponds to the vacuum d.o. $\rho_{0}=|0\rangle\langle 0|.$ The coherent state vectors are $|z\rangle=D(z)|0\rangle.$

We will use the P-representation in the case of nondegenerate $\Lambda:$

[TABLE]

Another important Gaussian d.o. is obtained by action of the displacement operators

[TABLE]

it has the quantum characteristic function

[TABLE]

3 Gaussian observables and ensembles

In this Section we will consider Gaussian observables with the outcome space $\mathbb{C}^{s},$ described by POVM

[TABLE]

where $K$ a nondegenerate complex $s\times s-$ matrix, and d.o. $\rho_{N}$ is defined by (6) with $\Lambda=N.$ This is a special (gauge-covariant) case of general Gaussian observables considered in [12]. Particularly important is the case $K=I_{s},$ where

[TABLE]

In the Appendix A we recall alternative description of such observables via extension to a spectral measure in a composite system including ancillary system (going back to [6]). By taking $N=0$ so that $\rho_{0}=|0\rangle\langle 0|$ is the vacuum state, we obtain the multimode version of the “heterodyne measurement”

[TABLE]

see [13]. Thus the POVM (11) corresponds to a noisy generalization of the multimode heterodyne measurement.

Let $\rho$ be an input d.o. then by using (5), (9) and real-valuedness of the quadratic form under the exponent, the output p.d. of the observable (11) is

[TABLE]

and that of observable (10) is $\tilde{p}_{\rho}(z)=p_{\rho}(Kz)\left|\det K\right|^{2}.$ Notice that all these p.d.’s belong to the class $\mathcal{D}$ because $0\leq\mathrm{Tr}\,\rho\sigma\leq 1$ for any two d.o. $\rho,\sigma$ . Thus the differential entropy of the output p.d. is well-defined and

[TABLE]

Let $\Sigma$ be a nonnegative definite complex Hermitian $s\times s-$ matrix. By $\mathfrak{S}(\Sigma)$ we denote the set of all d.o. with the complex covariance matrix

[TABLE]

There is a unique gauge-invariant Gaussian d.o. $\rho_{\Sigma}$ in $\mathfrak{S}(\Sigma)$ . By using (6) with $\Lambda=\Sigma$ and (3), one obtains the output p.d.

[TABLE]

which is complex Gaussian p.d. with the covariance matrix $\Sigma+N+I_{s}=\tilde{\Sigma}+I_{s}.$

We have

[TABLE]

Indeed, for any $\rho\in\mathfrak{S}(\Sigma)$ define the gauge-invariant d.o.

[TABLE]

By the concavity of the differential entropy and Jensen’s inequality $h(p_{\rho})\leq h(p_{\rho_{gi}}).$ By using (4) it is not difficult to check that $\rho_{gi}\$ has zero first moments, finite second moments $\mathrm{Tr}\,a_{j}^{\dagger}\rho a_{k}$ given by (15) and other second moments such as $\mathrm{Tr}\,a_{j}\rho a_{k},\,\mathrm{Tr\,}a_{j}^{\dagger}\rho a_{k}^{\dagger}$ vanishing. Thus it has all the first and the second moments the same as the Gaussian d.o. $\rho_{\Sigma}.$ By the classical maximum entropy principle [16], we have $h(p_{\rho_{gi}})\leq h(p_{\Sigma})$ , which proves (17).

We will be interested in the following constrained $\chi-$ capacity of the channel $\mathcal{M}$

[TABLE]

where $I(\mathcal{E},\mathcal{M})$ is the classical information quantity defined in (2).

Theorem 1

Let $\widetilde{\mathcal{M}}$ be the measurement channel corresponding to the Gaussian observable (10), then the supremum in (18) is equal to

[TABLE]

and is attained on the Gaussian ensemble of coherent states

[TABLE]

The relation (19) can be considered as a multimode version of a formula obtained in [1] by “information exclusion” argument.

Proof. The channel $\mathcal{M}$ defined by (11) is covariant with respect to the irreducible action of the displacement operators $D(z)$ which means

[TABLE]

for any Borel subset $B\subseteq\mathbb{C}^{s}$ or, equivalently,

[TABLE]

By adapting the argument from [14] for irreducibly covariant quantum channel to our case of quantum-classical channel, we obtain

[TABLE]

where

[TABLE]

is the minimal output differential entropy. Indeed, from (2) it follows that the right-hand side (with ‘sup’ in place of ‘max’ and ‘inf’ in place of ‘min’) is an upper bound for $C_{\chi}(\mathcal{M};\Sigma)$ . Its achievability follows from Proposition 2 of the recent paper [15], since all the assumptions of that result are fulfilled in the Gaussian gauge-invariant case under consideration here.

To be explicit, first, by a proof of generalization of the Wehrl conjecture to the measurements of the form (11) obtained in [2], the minimum (22) is attained on the vacuum state $\rho_{0}=|0\rangle\langle 0|$ to which corresponds the output p.d.

[TABLE]

so that

[TABLE]

Second, take an ensemble $\mathcal{E}$ such that $\bar{\rho}_{\mathcal{E}}\in\mathfrak{S}(\Sigma).$ Then $p_{\bar{\rho}_{\mathcal{E}}}$ has the covariance matrix $\Sigma+N+I_{s}$ (see Appendix A). By the maximum entropy principle (17), the maximum of $h(p_{\bar{\rho}_{\mathcal{E}}})$ is attained on the Gaussian d.o. $\rho_{\Sigma}$ and is equal to

[TABLE]

And finally, by (7) the d.o. $\rho_{\Sigma}$ is the average state of the Gaussian ensemble $\mathcal{E}$ of coherent states obtained from the vacuum state $\rho_{0}=|0\rangle\langle 0|$ by the action of the displacement operators $D(z)$ , which is thus the optimal ensemble achieving the upper bound for (18).

Thus we obtain the value

[TABLE]

By taking into account (14) we also obtain that

[TABLE]

i.e. rescaling the observable by nondegenerate $K$ has no effect on the $\chi-$ capacity of the measurement (11) and the optimal ensemble. $\square$

The importance of the quantity (18) is apparent: it is a key for computing the energy-constrained classical capacity of the channel $\mathcal{M}$ which is important quantity characterizing the information-processing performance of the measurement. Indeed, let

[TABLE]

be a quadratic gauge-invariant Hamiltonian, where $\epsilon=\left[\epsilon_{jk}\right]$ is positive definite Hermitian matrix, so that the mean energy of the input d.o. $\rho$ is equal to

[TABLE]

where $\mathrm{Sp}$ denotes trace of $s\times s-$ matrices as distinct from the trace of operators. Then the energy constraint has the form $\mathrm{Sp\,}\epsilon\Sigma\leq E,$ where $E$ is a positive number, and the energy-constrained classical capacity of the channel $\mathcal{M}$ is equal to

[TABLE]

Notice that the additivity issue does not arise here because measurement channels are entanglement breaking [9], [12]. Given an explicit expression for $C_{\chi}(\mathcal{M};\Sigma)$ such as (19), computation of the last supremum is a separate optimization problem which can be solved analytically in some special cases. For example, if $H=\sum_{j}^{s}\hbar\omega_{j}a_{j}^{\dagger}a_{j},$ so that $\epsilon$ is diagonal, and $N=\mathrm{diag}\left[n_{j}\right],$ then the optimal $\Sigma$ is also diagonal and its entries $s_{j}$ can be found with a simple generalization of the “water-filling solution”, cf. [16], namely

[TABLE]

where $\nu$ is found from the equation $\sum_{j}^{s}\hbar\omega_{j}s_{j}=E,$ and

[TABLE]

The following result (for observable (12)) was conjectured in the early seventies. In [6] it was observed that the measurement (12) for the Gaussian ensemble (25), (26) below gives the information amount (24) which is thus the lower bound for the accessible information of the ensemble defined as

[TABLE]

where the supremum is over all observables $\mathcal{M}$ . The conjecture was that the observable (12) gives the global maximum. In [7] the authors verified the necessary local extremality condition for information based on the first variation derived in [17], and in [8] the second variation was shown nonpositive333The English versions of these articles were also posted as arXiv:quant-ph/0511042, arXiv:quant-ph/0511043.. However to our knowledge the question of the global maximum was open until now.

Theorem 2

Let $\mathcal{E}$ be the Gaussian ensemble $\left\{\pi(d^{2s}z),\rho_{N,z}\right\},$ where 444For the clarity of proofs we assume that the covariance matrices $\Sigma,\,N$ are nondegenerate, although this restriction can be relaxed by using more abstract computations with the quantum characteristic functions.

[TABLE]

is d.o. (8) with $\Lambda=N.$

Then the accessible information $A(\mathcal{E})$ of this ensemble is equal to (24) and is attained on any Gaussian observable of the form

[TABLE]

where $\det K\neq 0,$ in particular, on the observable (12).

Proof. By using (8) and convolution of Gaussian densities, we obtain the average state of the ensemble (25), (26)

[TABLE]

Computation using (2) and (23) gives

[TABLE]

for the ensemble $\mathcal{E}$ and observable $\mathcal{M}_{\ast}$ defined by (12), thus giving the lower bound for the accessible information $A(\mathcal{E}).$ Any observable (27) gives the same value by (14).

We now use the general upper bound from the next section:

[TABLE]

where $\mathcal{M}^{\prime}$ is observable dual to the ensemble $\mathcal{E}$ (defined by Eq. (41) in proposition 3 below), and the supremum is taken over all ensembles $\mathcal{E}^{\prime}$ satisfying the condition $\bar{\rho}_{\mathcal{E}^{\prime}}=\bar{\rho}_{\mathcal{E}}$ . In the case we are considering, this observable will turn out Gaussian so that we can apply to it theorem 1 to compute the right-hand side of the inequality (30). According to Eq. (41) the dual observable is given by the relation

[TABLE]

By using the decomposition in the normal modes associated with the orthonormal basis of eigenvectors of the matrix

[TABLE]

we obtain (see (58) in Appendix A for detail)

[TABLE]

Substituting this into (3), we get

[TABLE]

By making change of variables

[TABLE]

and denoting

[TABLE]

we obtain, by arranging the terms in the quadratic form under the exponent in (34),

[TABLE]

which has the same Gaussian form as $\mathcal{\tilde{M}}$ in theorem 1.

We now compute the supremum in the right-hand side of (30) by using theorem 1 with $N$ replaced by $\tilde{N}$ and $\Sigma$ replaced by $\tilde{\Sigma}=\Sigma+N$ from the average state $\bar{\rho}_{\mathcal{E}}$ given by (28). Theorem 1 then implies

[TABLE]

A computation below shows that

[TABLE]

where $T=\sqrt{\tilde{\Sigma}\left(\tilde{\Sigma}+I_{s}\right)}.$ This gives the upper estimate for $A(\mathcal{E}\mathbf{)}$ which coincides with the lower estimate (3), thus proving the theorem.

To prove (37), we obtain from (35)

[TABLE]

then

[TABLE]

Substituting

[TABLE]

into (38), we obtain

[TABLE]

where $T=\sqrt{\tilde{\Sigma}\left(\tilde{\Sigma}+I_{s}\right)},$ hence (37) follows. $\square$

4 Ensemble-observable duality

Duality between ensembles and observables proved to be an efficient tool in quantum information theory (see [1], [5], [18] or [4]). In this section we provide a rigorous infinite-dimensional and continuous-variables version of this duality used in the proof of theorem 2.

Proposition 3

Let $\mathcal{E}=\left\{\pi(dx),\rho_{x}\right\}$ be an ensemble and $\mathcal{M}=\left\{M(dy)\right\}$ an observable such that

[TABLE]

where $\mu(dy)$ is a $\sigma-$ finite measure, $m(y)$ is weakly measurable function with values in the cone of bounded positive operators in $\mathcal{H}$ and the integral weakly converges ( $B$ is an arbitrary Borel subset of $\mathcal{H}$ ).

Define the dual pair ensemble-observable $(\mathcal{E}^{\prime},\mathcal{M}^{\prime})$ by the relations

[TABLE]

for $\psi\in\mathrm{ran\,}\bar{\rho}_{\mathcal{E}}^{1/2}\oplus\mathcal{H}_{0}$ , where $\mathcal{H}_{0}=\mathrm{ker\,}\bar{\rho}_{\mathcal{E}}^{1/2}$ 555We use the generalized inverse for $\bar{\rho}_{\mathcal{E}}^{-1/2}$ .. Then the average states of both ensembles coincide

[TABLE]

Moreover, the joint distribution of $x,y$ is the same for both pairs $(\mathcal{E},\mathcal{M})$ and $(\mathcal{E}^{\prime},\mathcal{M}^{\prime})$ so that

[TABLE]

Proof. From (40) it follows

[TABLE]

The definition (41) implies

[TABLE]

for dense domain of $\psi$ implying that $M^{\prime}(A)$ are bounded positive operators with $M^{\prime}(\mathcal{X})=I$ . The definition via integral also implies $\sigma$ -additivity, hence $\mathcal{M}^{\prime}$ is an observable.

Notice the identity

[TABLE]

Then the joint distribution of $x,y$

[TABLE]

via (44) is equal to

[TABLE]

hence (43) holds. $\square$

The equality (43) implies an estimate for the accessible information of the ensemble $\mathcal{E}$

[TABLE]

where the supremum is over all observables $\mathcal{M}$ .

Proposition 4

Let $\mathcal{E}$ be a fixed ensemble and $\mathcal{M}^{\prime}$ be the dual observable, then

[TABLE]

where the supremum in the right-hand side is taken over all ensembles $\mathcal{E}^{\prime}$ satisfying the condition $\bar{\rho}_{\mathcal{E}^{\prime}}=\bar{\rho}_{\mathcal{E}}$ .

Proof. We first prove the inequality (30) which was used in the proof of theorem 2. We repeat it here for convenience:

[TABLE]

For this it is sufficient to show that

[TABLE]

where on the right the supremum is taken over observables $\mathcal{M}$ which satisfy (39) with respect to some measure $\mu$ . Then by using the proposition 3 we obtain

[TABLE]

where in the right-hand side the supremum is taken over ensembles $\mathcal{E}^{\prime}$ that can be written in the form $(\ref{piprime})$ for suitable $\mathcal{M}$ , whence (46) will follow.

Proof of the equality (47) is based on two facts. First, we show that any observable $\mathcal{M}$ can be approximated by a sequence of observables $\left\{\mathcal{M}_{n}\right\}$ satisfying (39) for some measures $\mu_{n}$ . Second, we observe that the information quantity $I(\mathcal{E},\mathcal{M})$ is lower semicontinuous in this approximation.

Let $\mathcal{M}=\{M(dy)\}$ be an observable, and let $\left\{P_{n}\right\}$ be a nondecreasing sequence of projections in $\mathcal{H}$ such that $P_{n}\uparrow I$ as $n\rightarrow\infty.$ Define the measure $\mu_{n}(B)=\mathrm{Tr}P_{n}M(B)$ and the sequence of observables

[TABLE]

Then $\mathcal{M}_{n}=\{M_{n}(dy)\}$ satisfies (39) with the measure $\mu_{n}(B).$ Indeed, $0\leq P_{n}M(B)P_{n}\leq P_{n}\mu_{n}(B)$ and the second term in the direct sum (48) is dominated by $\left(I-P_{n}\right)\mu_{n}(B).$ Hence, by an operator version of Radon-Nikodym theorem, $M_{n}(B)=\int_{B}m_{n}(y)\mu_{n}(dy),$ with $\left\|m_{n}(y)\right\|\leq 1$ (mod $\mu_{n}$ ).

For arbitrary d.o. $\rho$ and arbitrary Borel $B\subseteq\mathcal{Y}$

[TABLE]

as $n\rightarrow\infty.$

Now let $\mathcal{V=}\left\{B_{k}\right\}$ be a finite decomposition of the space $\mathcal{Y}$ into Borel subsets $B_{k}.$ Define the finitely valued “coarse-grained” observable $\mathcal{M}_{\mathcal{V}}=\left\{M(B_{k})\right\}.$ A general result of classical information theory (cf. [10]) implies

[TABLE]

where the supremum is taken over all the decompositions $\mathcal{V}$ . We will prove that for a fixed $\mathcal{V}$ the functional $I(\mathcal{E},\mathcal{M}_{\mathcal{V}})$ is continuous with respect to the approximation (4), then it will follow that $I(\mathcal{E},\mathcal{M})$ is lower semicontinuous. Denoting $P_{\rho}(B)=\mathrm{Tr}\,\rho\,M(B),$ we have

[TABLE]

When we approximate $\mathcal{M}$ by $\mathcal{M}_{n},$ the first term converges by (4) and by continuity of the Shannon entropy. In the second term the integrand converges pointwise by (4) and it is uniformly bounded because $-e^{-1}\log e\leq P\log P\leq 0$ for $0\leq P\leq 1.$ This finishes the proof of (47) and hence of (46).

Let us now prove the stronger result: the equality (45), by showing that any ensemble $\mathcal{E}^{\prime}=\left\{\pi(dy),\rho_{y}\right\}$ with fixed average state $\bar{\rho}_{\mathcal{E}^{\prime}}=\bar{\rho}_{\mathcal{E}}$ can be approximated by ensembles of the form (40). First, if $\bar{\rho}_{\mathcal{E}}$ has finite rank, the problem reduces to finite dimensional one which is easily solved. Therefore assume that the rank of $\bar{\rho}_{\mathcal{E}}$ is infinite (for simplicity we can assume that $\bar{\rho}_{\mathcal{E}}$ is nondegenerated). Let $P_{n}$ be the projection onto the eigenspace of $\bar{\rho}_{\mathcal{E}}$ corresponding to $n$ largest eigenvalues. Let

[TABLE]

then

[TABLE]

where $\lambda_{n}$ is the smallest eigenvalue for eigenvectors in the range of $P_{n}.$ Then

[TABLE]

hence $M(B)=\int_{B}m_{n}(y)\,\pi(dy)$ is an observable. Moreover, $\mathrm{Tr\,}\bar{\rho}_{\mathcal{E}}\,m_{n}(y)=1$ (mod $\pi$ ). Define ensemble $\mathcal{E}_{n}^{\prime}=\left\{\pi_{n}(dy),\rho_{y}^{n}\right\}$ by taking $\pi_{n}(dy)=\pi(dy)$ and

[TABLE]

then by (50) the average state of $\mathcal{E}_{n}^{\prime}$ is $\bar{\rho}_{\mathcal{E}}.$ Ensemble $\mathcal{E}_{n}^{\prime}$ has the required form (40). Moreover, for any observable $\mathcal{M}^{\prime}=\left\{M^{\prime}(dx)\right\}$ the joint probability

[TABLE]

Indeed,

[TABLE]

pointwise, remaining uniformly bounded by 1. For any finite decomposition $\mathcal{V=}\left\{A_{k}\right\}$ of the space $\mathcal{X}$ and $\mathcal{V^{\prime}=}\left\{B_{k}\right\}$ of the space $\mathcal{Y}$ , the “coarse-grained” mutual information is continuous and the mutual information is lower semicontinuous by the argument in the proof above, hence

[TABLE]

It implies finally the equality (45). $\square$

5 Conclusion

We have considered quantum Gaussian multimode system with the global gauge symmetry and obtained explicit formula for the classical capacity of a Gaussian observable which describes statistics of a noisy heterodyne measurement in such a system. We have shown that the capacity is attained by a Gaussian ensemble of coherent states. The condition of gauge covariance was relaxed in our recent paper [15], where the formula was generalized to the case where only certain “threshold condition” is fulfilled. Our second result gives explicit expression for the accessible information of a gauge-invariant Gaussian ensemble, and shows that it is attained by the multimode generalization of the (ideal) heterodyne measurement, solving a conjecture going back to the seventies. Moreover, the same value is attained by any multimode scaling of the measurement, illustrating the high degeneracy of the maximum characteristic to such kind of “quantum Gaussian optimizer” problems. A natural question of extensions of this result to quantum Gaussian systems without gauge symmetry, or even without any “threshold condition” remains open for investigation.

6 Appendix A

Let $\rho,\sigma$ be two d.o., then, generalizing (3), the relation

[TABLE]

defines a p.d. on $\mathbb{C}^{s}.$ Its classical characteristic function expressed via the symplectic Fourier transform is

[TABLE]

where in (51) we used (3) and the Parceval identity (5), and in (52) the transposition $\sigma\rightarrow\sigma^{\top}$ is defined by the relation

[TABLE]

The expression (53) can be rewritten as

[TABLE]

where $\alpha,\,\alpha^{\dagger}$ act in $\mathcal{H\otimes H}_{0},$ $\mathcal{H}_{0}\simeq\mathcal{H}$ is the Hilbert space of the ancillary system. The vectors $\alpha,\,\alpha^{\dagger}$ have the components

[TABLE]

where $a_{0j},\,a_{0j}^{\dagger}$ are annihilation-creation operators in $\mathcal{H}_{0}.$ These components are commuting normal operators, so that they have joint probability distribution with the classical characteristic function (55). Assuming that $\rho\in\mathfrak{S}(\Sigma),\,\sigma\in\mathfrak{S}(N),$ let us find the complex covariance matrix of this distribution. It has the components

[TABLE]

Thus the complex covariance matrix is

[TABLE]

Let $\left\{e_{k}\right\}$ be an orthonormal basis in $\mathbb{C}^{s},$ and let $\ z=\sum_{k=1}^{s}\zeta_{k}e_{k}$ be a decomposition of the vector $z$ in this basis. Then $a^{\dagger}=\sum_{k=1}^{s}\zeta_{k}b_{k}^{\dagger},$ where $a^{\dagger}e_{k}=b_{k}^{\dagger}$ are the new creation operators, corresponding to the modes associated with the basis $\left\{e_{k}\right\}.$ Let $|n_{k}\rangle$ be the eigenvector of the $k-$ th mode number operator $b_{k}^{\dagger}b_{k}$ , corresponding to the eigenvalue $n_{k}(=0,1,\dots).$ Then one has tensor product decomposition of a coherent state vector

[TABLE]

If $\left\{e_{k}\right\}$ is the basis of eigenvectors of the covariance matrix $\Lambda$ of the Gaussian d.o. $\rho_{\Lambda},$ with the corresponding eigenvalues $\lambda_{k},$ then

[TABLE]

It follows that $\rho_{\Lambda}^{-1/2}|z\rangle$ is given by the expression

[TABLE]

The formula (33) is obtained by choosing the basis of eigenvectors of the covariance matrix $\Lambda=\tilde{\Sigma}$ and then using this expression.

Acknowledgment. The work was supported by the grant of Russian Scientific Foundation (project No 19-11-00086). The author is grateful to M.E. Shirokov, G.G. Amosov, S.N. Filippov and anonymous referees for useful remarks.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. J. W. Hall, “Quantum information and correlation bounds,” Phys. Rev. A vol. 55, pp. 1050-2947, 1997.
2[2] V. Giovannetti, A. S. Holevo, A. Mari, “Majorization and additivity for multimode bosonic Gaussian channels,” Theor. Math. Phys. , vol. 182, pp. 284–293, 2015.
3[3] V. Giovannetti, A. S. Holevo, R. Garcia-Patron, “A Solution of Gaussian Optimizer Conjecture for Quantum Channels,” Commun. Math. Phys. vol. 334, pp. 1553-1571, 2015.
4[4] A. S. Holevo, “Information capacity of quantum observable,” Probl. Inform. Transmission vol. 48, pp. 1-10, 2012.
5[5] M. Dall’Arno, G. M. D’Ariano, M.F. Sacchi, “Informational power of quantum measurements,” Phys. Rev. A , vol. 83, 062304, 2011.
6[6] A. S. Holevo, “On the Mathematical Theory of Quantum Communication Channels, Probl. Inform. Transmission ,” vol. 8, pp. 47-54, 1972.
7[7] V. P. Belavkin, R. L. Stratonovich, “Optimization of Quantum Information Processing Maximizing Mutual Information,” Radio Eng. Electron. Phys. , vol. 19, p. 1349, 1973. [trans. from Radiotekhnika i Electronika , vol. 19, pp. 1839-1844, 1973.
8[8] V. P. Belavkin, A. G. Vantsyan, “On the Sufficient Optimality Condition for Quantum Information Processing,” Radio Eng. Electron. Phys. , vol. 19, p. 39, 1974. [trans. from Radiotekhnika i Electronika , vol. 19, pp. 1391–1395, 1974.