Quantum Semiparametric Estimation

Mankei Tsang; Francesco Albarelli; Animesh Datta

arXiv:1906.09871·quant-ph·August 5, 2020

Quantum Semiparametric Estimation

Mankei Tsang, Francesco Albarelli, Animesh Datta

PDF

Open Access

TL;DR

This paper develops a quantum semiparametric estimation theory that provides simple bounds for high-dimensional quantum systems with limited prior assumptions, applicable to practical quantum measurement scenarios.

Contribution

It introduces a new framework for quantum semiparametric estimation that overcomes high dimensionality and limited prior knowledge, linking bounds to Holevo's quantum Cramér-Rao bound.

Findings

01

Provides analytic bounds for high-dimensional quantum estimation problems.

02

Relates bounds to Holevo's quantum Cramér-Rao bound for asymptotic attainability.

03

Applicable to quantum state properties like fidelity, purity, and entropy.

Abstract

In the study of quantum limits to parameter estimation, the high dimensionality of the density operator and that of the unknown parameters have long been two of the most difficult challenges. Here we propose a theory of quantum semiparametric estimation that can circumvent both challenges and produce simple analytic bounds for a class of problems in which the dimensions are arbitrarily high, few prior assumptions about the density operator are made, but only a finite number of the unknown parameters are of interest. We also relate our bounds to Holevo's version of the quantum Cram\'er-Rao bound, so that they can inherit the asymptotic attainability of the latter in many cases of interest. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a pure…

Equations502

E

E

E

E

E

E

E

E

C_{Y Z}

ρ = exp (- i H β) ρ_{0} exp (i H β),

ρ = exp (- i H β) ρ_{0} exp (i H β),

E

E

F \equiv {ρ (θ) : θ \in Θ \subseteq R^{p}}

F \equiv {ρ (θ) : θ \in Θ \subseteq R^{p}}

{∣ j ⟩ : j \in Q, ⟨ j ⟩ k = δ_{j k}}

{∣ j ⟩ : j \in Q, ⟨ j ⟩ k = δ_{j k}}

d

d

\partial ρ

\partial ρ

\partial_{j} ρ (θ) ∣_{θ = ϕ} = ρ (ϕ) \circ S_{j} (ϕ), j = 1, \dots, p .

\partial_{j} ρ (θ) ∣_{θ = ϕ} = ρ (ϕ) \circ S_{j} (ϕ), j = 1, \dots, p .

\int \overset{ˇ}{β} (λ) tr d E (λ) ρ

\int \overset{ˇ}{β} (λ) tr d E (λ) ρ

E

E

E

E

K_{j k}

K_{j k}

⟨ h, g ⟩

⟨ h, g ⟩

∥ h ∥

∥ h ∥

⟨ A, B ⟩_{j k}

⟨ A, B ⟩_{j k}

Y

Y

Z

Z

Z^{⊥}

Z^{⊥}

Π (h ∣ Z) = h - Π (h ∣ Z^{⊥}) = h - ⟨ h, I ⟩ .

Π (h ∣ Z) = h - Π (h ∣ Z^{⊥}) = h - ⟨ h, I ⟩ .

T

T

⟨ S_{j}, h ⟩

⟨ S_{j}, h ⟩

T^{⊥}

T^{⊥}

Π (h ∣ T) = h - Π (h ∣ T^{⊥}) .

Π (h ∣ T) = h - Π (h ∣ T^{⊥}) .

D

D

δ = \int \overset{ˇ}{β} (λ) d E (λ) - β .

δ = \int \overset{ˇ}{β} (λ) d E (λ) - β .

E

E

E

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Information and Cryptography · Spectroscopy and Quantum Chemical Studies · Quantum Computing Algorithms and Architecture

Full text

Quantum Semiparametric Estimation

Mankei Tsang

[email protected] https://blog.nus.edu.sg/mankei/ Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583

Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117551

Francesco Albarelli

[email protected]

Faculty of Physics, University of Warsaw, 02-093 Warszawa, Poland

Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom

Animesh Datta

[email protected]

Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom

Abstract

In the study of quantum limits to parameter estimation, the high dimensionality of the density operator and that of the unknown parameters have long been two of the most difficult challenges. Here we propose a theory of quantum semiparametric estimation that can circumvent both challenges and produce simple analytic bounds for a class of problems in which the dimensions are arbitrarily high, few prior assumptions about the density operator are made, but only a finite number of the unknown parameters are of interest. We also relate our bounds to Holevo’s version of the quantum Cramér-Rao bound, so that they can inherit the asymptotic attainability of the latter in many cases of interest. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a pure state, the purity, or the von Neumann entropy. Potential applications include quantum state characterization for many-body systems, optical imaging, and interferometry, where full tomography of the quantum state is often infeasible and only a few select properties of the system are of interest.

I Introduction

The random nature of quantum mechanics has practical implications for the noise in sensing, imaging, and quantum-information applications Helstrom (1976); Demkowicz-Dobrzański et al. (2015); Paris (2009); *glm2011; *szczykulska16; *pirandola18; *braun18; *pezze18; *albarelli20a; Tsang et al. (2016); Tsang (2019a); Kolobov (1999); *kolobov07; *kolobov_fabre; *jezek; *hradil; *taylor16; *genovese16; *berchera19; *moreau19. To derive their fundamental quantum limits, one standard approach is to compute quantum versions of the Cramér-Rao bound Helstrom (1976); Demkowicz-Dobrzański et al. (2015); Paris (2009); Tsang et al. (2016); Tsang (2019a); Holevo (2011); Hayashi (2017, 2005). In addition to serving as rigorous limits to parameter estimation, the quantum bounds have inspired new sensing and imaging paradigms that go beyond conventional methods Paris (2009); Tsang et al. (2016); Tsang (2019a).

The study of quantum limits has grown into an active research field called quantum metrology in recent years, building on the pioneering work of Helstrom Helstrom (1976) and Holevo Holevo (2011). A major current challenge is the computation of quantum bounds for high-dimensional density operators and high-dimensional parameters, as the brute-force method quickly becomes intractable for increasing dimensions; see Refs. Yuan and Fung (2017); *genoni19; *chabuda20; *fanizza19; Albarelli et al. (2019) for a sample of recent efforts to combat the so-called curse of dimensionality. Most of the existing methods, however, ultimately have to resort to numerics for high dimensions. While numerical methods are no doubt valuable, analytic solutions should be prized higher—as with any study in physics—for their simplicity and offer of insights. Unfortunately, except for a few cases where one can exploit the special structures of the density-operator family Helstrom (1976); Holevo (2011); Tsang et al. (2011); *guta11; Ng et al. (2016); Zhou and Jiang (2019); Tsang (2019b), analytic results for high-dimensional problems remain rare in quantum metrology.

Here we propose a theory of quantum semiparametric estimation that can turn the problem on its head and deal with density operators with arbitrarily high dimensions and little assumed structure. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a given pure state, the purity, or the von Neumann entropy. The density operator is assumed to come from an enormous family, its dimension can be arbitrarily high and possibly infinite, and the unknown “nuisance” parameters have a similar dimension to that of the density operator. Despite the seemingly bleak situation, our theory can yield surprisingly simple analytic results, precisely because of the absence of structure. Our results are ideally suited to scientific applications, such as quantum state characterization Gühne and Tóth (2009); Paris and Rehacek (2004); *lvovsky09; *horodecki; *filip02; *brun04; *flammia11; *enk12; Horodecki (2003) optical imaging Helstrom (1976); Tsang (2019a); Kolobov (1999); Zhou and Jiang (2019); Tsang (2019b), and interferometry Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015); Paris (2009), where the dimensions can be high, the density operator is difficult to specify fully, and it is prudent to assume little prior information.

The theory set forth generalizes the deep and exquisite theory of semiparametric estimation in classical statistics Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), which has seen wide applications in fields such as biostatistics Tsiatis (2006), econometrics Newey (1990), astrostatistics Feigelson and Babu (2012), and, most recently, optical superresolution Tsang (2019c). By necessity, the classical theory involves infinite-dimensional spaces for random variables and makes extensive use of geometric and Hilbert-space concepts. As will be seen later, the operator Hilbert space introduced by Holevo Holevo (2011, 1977) turns out to be the right arena for the quantum case, and the geometric picture of quantum states Hayashi (2017, 2005); Amari and Nagaoka (2000); Uhlmann (1993); *braunstein; *bengtsson; *sidhu20 can provide illuminating insights.

Our formalism is primarily based on Helstrom’s version of the quantum Cramér-Rao bound Helstrom (1976). While this allows us to adapt the classical methods more easily, it is unable to account for the increased errors due to the incompatibility of quantum observables when multiple parameters are involved Holevo (2011); Demkowicz-Dobrzanski et al. (2020). We address this issue by studying also Holevo’s version of the quantum Cramér-Rao bound Holevo (2011) in the semiparametric setting and proving that the two versions turn out to be close. This result enables our bounds to inherit the asymptotic attainability of Holevo’s bound Kahn and Guţă (2009); Gill and Guţă (2013); *yamagata13; *yang19; Demkowicz-Dobrzanski et al. (2020) in many cases of interest.

II Preview of typical results

Before going into the formalism, we present some typical results of the theory to offer motivation.

Suppose that an experimenter has received $N$ quantum objects, such as atoms, electrons, photons, or optical pulses, each with the same quantum state $\rho$ . The experimenter would like to estimate a parameter $\beta$ as a function of $\rho$ . Without any knowledge or assumption about $\rho$ , what is the best measurement to perform for the estimation of $\beta$ , and what is the fundamental limit to the precision for any measurement?

The quantum semiparametric theory can provide simple answers to the above questions. For the simplest example, let $\beta=\operatorname{tr}\rho Y$ , where $Y$ is a given observable, and assume that the estimator is required to be unbiased. For example, one may wish to estimate

the mean position of photons or electrons in optical or electron microscopy, 2. 2.

the mean photon number in an optical mode in optical sensing, imaging, and communication Helstrom (1976), 3. 3.

the mean energy, momentum, or field of quantum particles in particle-physics, condensed-matter, or quantum-chemistry experiments, 4. 4.

a density-matrix element, the fidelity $\bra{\psi}\rho\ket{\psi}$ to a target pure state $\ket{\psi}$ , or an entanglement witness in quantum-information experiments Gühne and Tóth (2009); Paris and Rehacek (2004).

This problem appears in all areas of quantum mechanics Schwartz (2014); *chaikin; *haken; *bravyi19, as most quantum calculations offer predictions in terms of expectation values only, and experiments that aim to estimate the expectation values and verify the predictions with few assumptions about the density operator are in essence semiparametric estimation. The theory here shows that the optimal measurement is simply a von Neumann measurement of the observable $Y$ of each copy of the objects, followed by an average of the outcomes. For any measurement, the mean-square error of the estimation, denoted by the sans-serif $\mathsf{E}$ , has a quantum limit given by

[TABLE]

Absent any information about $\rho$ , the separate measurements and the sample mean seem to be the most obvious procedure, but it is not at all obvious that it is optimal, given the infinite possibilities allowed by quantum mechanics.

While Eq. (1) has been derived before via a more conventional method for a finite-dimensional $\rho$ Watanabe et al. (2010); *watanabe11, our theory can also deal with infinite dimensions as well as more advanced examples in quantum information and quantum thermodynamics. For example, if the parameter of interest is the purity $\beta=\operatorname{tr}\rho^{2}$ , the bound is

[TABLE]

and if the parameter is the relative entropy $\beta=\operatorname{tr}\rho(\ln\rho-\ln\sigma)$ with respect to a target state $\sigma$ , the bound is

[TABLE]

For these two examples, the bounds are asymptotically attainable in principle, at least when $\rho$ is finite-dimensional Kahn and Guţă (2009); Gill and Guţă (2013); *yamagata13; *yang19; Demkowicz-Dobrzanski et al. (2020).

The semiparametric theory is relevant to experiments on many-body quantum systems and quantum simulation Bloch et al. (2008); *georgescu14, because often there is no simple model for $\rho$ , full tomography of $\rho$ is infeasible, and only a few select properties of the system may be of interest. Although a significant literature in quantum information has been devoted to such semiparametric problems Gühne and Tóth (2009); Paris and Rehacek (2004); *lvovsky09; *horodecki; *filip02; *brun04; *flammia11; *enk12; Horodecki (2003), their connections to the classical theory have not yet been recognized. By generalizing the classical theory, this work establishes fundamental limits to the task, indicating the minimum amount of resources needed to achieve a desired precision and also offering a rigorous yardstick for experimental design. This work thus addresses a foundational question by Horodecki Horodecki (2003): “What kind of information (whatever it means) can be extracted from an unknown quantum state at a small measurement cost?” Our work shows that quantum metrology—and quantum semiparametric estimation in particular—offers a viable attack on the question via a statistical notion of efficiency.

An extension of the above scenario is the estimation of $\beta$ given a constraint on $\rho$ . For example, suppose that the quantum state is known to possess a mean energy $\operatorname{tr}\rho H=E$ , where $H$ is the Hamiltonian, or attain a fidelity of $\bra{\phi}\rho\ket{\phi}=F$ with respect to another pure state $\ket{\phi}$ . How may this new information affect the estimation? Write the constraint as $\operatorname{tr}\rho Z=\zeta$ , where $Z$ is an observable and $\zeta$ is a given constant. The quantum bound for the $\beta=\operatorname{tr}\rho Y$ example turns out to be

[TABLE]

where $A\circ B=(AB+BA)/2$ denotes the Jordan product. The bound is reduced by the correlation between $Y$ and $Z$ .

Another paradigmatic problem in quantum metrology is displacement estimation Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015); Paris (2009), which can be modeled by

[TABLE]

where $\rho_{0}$ is the initial state, $H$ is a generator, such as the photon-number operator in optical interferometry, and $\beta$ is the displacement parameter to be estimated. Applications range from optical and atomic interferometry to atomic clocks, magnetometry, laser ranging, and localization microscopy Demkowicz-Dobrzański et al. (2015); Paris (2009); Kolobov (1999). If nothing is known about $\rho_{0}$ other than a constraint $\operatorname{tr}\rho_{0}Z=0$ , the quantum bound turns out to be

[TABLE]

where $[Z,H]\equiv ZH-HZ$ . Our theory can in fact give similarly simple results for a class of such semiparametric problems.

It must be stressed that, apart from the underlying Hilbert space and the constraints discussed above, the experimenter is assumed to know nothing about the density operator, and the bounds here are valid regardless of its dimension. The existing method of deriving such quantum limits is to model $\rho$ with many parameters Hayashi (2017, 2005); Kahn and Guţă (2009); Watanabe et al. (2010), compute a quantum version of the Fisher information matrix, and then invert it. This brute-force method is rarely feasible for problems with high or infinite dimensions. A new philosophy is needed.

In the next sections, we present the theory of quantum semiparametric estimation in increasing sophistication. Sections III and IV generalize the quantum Cramér-Rao bound proposed by Helstrom Helstrom (1976) in a geometric picture. While the picture is not new Hayashi (2005); Amari and Nagaoka (2000), it has so far remained an intellectual curiosity only. Sections III and IV show that it can in fact give simple solutions, such as Eqs. (1)–(3), to a class of semiparametric problems with arbitrary dimensions. Section III establishes the general formalism and also proves results that are valid for finite dimensions, while Sec. IV deals with the infinite-dimensional case via an elegant concept called parametric submodels. In the classical theory, the concept was first adumbrated by Charles Stein Stein (1956) and developed by Levit and many others Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006). Section V further develops the formalism to account for constraints on the density-operator family, in order to produce results such as Eq. (4). An example of entropy estimation in quantum thermodynamics is also discussed there. Section VI discusses some practical problems in optics and summarizes existing results on incoherent optical imaging Tsang (2019a) in the language of quantum semiparametrics, in order to provide a more concrete context for the formalism. Section VII considers semiparametric estimation in the presence of explicit nuisance parameters and studies in particular the problem of displacement estimation with a poorly characterized initial state, in order to produce results such as Eq. (7). To complete the formalism, Section VIII considers a vectoral parameter of interest and Holevo’s version of the quantum Cramér-Rao bound Holevo (2011). There we prove that the Helstrom and Holevo bounds are equal if the parameter of interest is a scalar, and they remain within a factor of two of each other in the vectoral case. The latter fact generalizes a recent result in the parametric setting Carollo et al. (2019); *carollo20. Thus the Helstrom version can inherit the asymptotic attainability of the latter Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020) to within a factor of two.

III Geometric picture of quantum estimation

theory

This section is organized as follows. Section III.1 introduces the Helstrom bound in the conventional formulation. Section III.2 introduces some important Hilbert-space concepts, including the tangent space and the influence operators. Section III.3 generalizes the Helstrom bound in terms of a projection of an influence operator into the tangent space. Section III.4 shows how an influence operator can be derived for a given parameter of interest, while Sec. III.5 proves that the tangent space is simple if the density operator is assumed to be finite-dimensional but otherwise arbitrary. The projection is then straightforward, and Sec. III.5 demonstrates the derivation of Eqs. (1)–(3) as examples.

III.1 Helstrom bound

Let

[TABLE]

be a family of density operators parametrized by $\theta=(\theta_{1},\dots,\theta_{p})^{\top}$ , where the superscript $\top$ denotes the matrix transpose and $p$ denotes the dimension of the parameter space $\Theta$ . The operators are assumed to operate on a common Hilbert space $\mathcal{H}$ , with an orthonormal basis

[TABLE]

that does not depend on $\theta$ . Let

[TABLE]

be the dimension of $\mathcal{H}$ , which may be infinite. The family is assumed to be smooth enough so that any $\partial_{j}\equiv\partial/\partial\theta_{j}$ can be interchanged with the operator trace $\operatorname{tr}$ in any operation on $\rho(\theta)$ . Define $\partial\equiv(\partial_{1},\dots,\partial_{p})^{\top}$ , and define a vector of operators $S\equiv(S_{1},\dots,S_{p})^{\top}$ as solutions to

[TABLE]

which is shorthand for the system of equations

[TABLE]

$\phi$ is the true parameter value, and all functions of $\theta$ in this section are assumed to be evaluated implicitly at the same $\theta=\phi$ . Each $S_{j}$ is called a symmetric logarithmic derivative in the quantum metrology literature, but here we call it a score, in accordance with the statistics terminology Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006). All vectors are assumed to be column vectors in this paper.

To model a measurement, define a positive operator-valued measure (POVM) $E$ on a measurable space $(\mathcal{X},\Sigma_{\mathcal{X}})$ , where $\Sigma_{\mathcal{X}}$ is the sigma algebra on the set $\mathcal{X}$ . Let the parameter of interest be a scalar $\beta(\theta)\in\mathbb{R}$ ; generalization for a vectoral $\beta$ will be done in Sec. VIII. Assume an estimator $\check{\beta}:\mathcal{X}\to\mathbb{R}$ that satisfies

[TABLE]

$(E,\check{\beta})$ is called a locally unbiased measurement, as we only require Eqs. (13) to hold at the true $\theta=\phi$ . Only local unbiasedness conditions are needed in this paper, and for brevity we will no longer explicitly describe them as local. Define the mean-square estimation error as

[TABLE]

If $p<\infty$ , a quantum version of the Cramér-Rao bound due to Helstrom Helstrom (1976), denoted by the sans-serif $\mathsf{H}$ , applies to any unbiased measurement and can be expressed as

[TABLE]

where the Helstrom information matrix $K$ is defined as

[TABLE]

The Helstrom bound sets a lower bound on the estimation error for any quantum measurement and any unbiased estimator Helstrom (1976); Holevo (2011); Hayashi (2017, 2005). The estimation of $\beta$ with an infinite-dimensional $\theta$ ( $p=\infty$ ) is called semiparametric estimation in statistics Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), although the methodology applies to arbitrary dimensions. If $\theta$ is partitioned into $(\beta,\eta_{1},\eta_{2},\dots)^{\top}$ , then $\eta$ is called nuisance parameters Tsiatis (2006); Suzuki et al. (2019).

III.2 Hilbert spaces for operators

We now follow Holevo Holevo (2011, 1977) and introduce operator Hilbert spaces in order to generalize the Helstrom bound for semiparametric estimation. The formalism may seem daunting at first sight, but the payoff is substantial, as it simplifies proofs, treats the infinite-dimensional case rigorously, and also enables one to avoid the explicit computation of $S$ and $K^{-1}$ for a large class of problems. In the following, we assume familiarity with the basic theory of Hilbert spaces and the mathematical treatment of quantum mechanics; see, for example, Refs. Holevo (2011); Debnath and Mikusiński (2005); Reed and Simon (1980).

All operators considered in this paper are self-adjoint. Consider $\rho$ in the diagonal form $\rho=\sum_{j}\lambda_{j}\ket{e_{j}}\bra{e_{j}}$ with $\lambda_{j}>0$ . The support of $\rho$ is $\operatorname{supp}(\rho)=\operatorname{\overline{span}}\{\ket{e_{j}}\}\subseteq\mathcal{H}$ , where $\operatorname{\overline{span}}$ denotes the closed linear span. $\rho$ is called full rank if $\operatorname{supp}(\rho)=\mathcal{H}$ . Define the weighted inner product between two operators $h$ and $g$ as

[TABLE]

and a norm as

[TABLE]

not to be confused with the operator norm $\lVert h\rVert_{\rm op}=\sup_{\ket{\psi}\in\mathcal{H}}\sqrt{\bra{\psi}h^{2}\ket{\psi}}\geq\lVert h\rVert$ . An operator is called bounded if $\lVert h\rVert_{\rm op}<\infty$ and square summable with respect to $\rho$ if $\lVert h\rVert<\infty$ , although all operators are bounded by definition if $d<\infty$ . For two vectors of operators $A$ and $B$ , it is convenient to use $\langle A,B\rangle$ to denote a matrix with entries

[TABLE]

such as $K=\langle S,S\rangle$ as a Gram matrix.

Define the real Hilbert space for square-summable operators with respect to the true $\rho$ as Holevo (2011, 1977)

[TABLE]

To be precise, each Hilbert-space element is an equivalence class of operators with zero distance between them, viz., $\{\hat{h}_{j}:\lVert\hat{h}_{j}-\hat{h}_{k}\rVert=0\ \forall j,k\}$ . The distinction between an element and its operators is important only if $\rho$ is not full rank; we put a hat on an operator if the distinction is called for. Two important Hilbert-space elements are the identity element $I$ and the zero element [math]; sometimes we will substitute $I=1$ for brevity.

Define a subspace of zero-mean operators as

[TABLE]

and the orthocomplement of $\mathcal{Z}$ in $\mathcal{Y}$ as

[TABLE]

In particular, the projection of any $h\in\mathcal{Y}$ into $\mathcal{Z}^{\perp}$ is simply $\Pi(h|\mathcal{Z}^{\perp})=\langle h,I\rangle$ , where $\Pi$ denotes the projection map, and

[TABLE]

The most important Hilbert space in estimation theory is the tangent space spanned by the set of scores $\{S\}\equiv\{S_{1},\dots,S_{p}\}$ Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), generalized here as

[TABLE]

$\{S\}$ is also known as the tangent set. The condition $\mathcal{T}\subseteq\mathcal{Z}$ requires the assumption $K_{jj}=\langle S_{j},S_{j}\rangle<\infty$ for all $j$ ; the zero-mean requirement is satisfied because $\langle S,I\rangle=\operatorname{tr}\partial\rho=\partial\operatorname{tr}\rho=0$ . A useful relation for any bounded operator $h$ is

[TABLE]

via Ref. (Holevo, 2011, Eq. (2.8.88)). Denote also the orthocomplement of $\mathcal{T}$ in $\mathcal{Z}$ as

[TABLE]

which is useful if a projection of $h\in\mathcal{Z}$ into $\mathcal{T}$ is desired and $\Pi(h|\mathcal{T}^{\perp})$ is easier to compute, since

[TABLE]

Another important concept in the classical theory is the influence functions Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), which we generalize by defining the set of influence operators as

[TABLE]

These operators play a major role in Holevo’s formulation of quantum Cramér-Rao bounds Holevo (2011); Ragy et al. (2016), although their connection to the classical concept did not seem to be appreciated before.

III.3 Generalized Helstrom bound

Let the error operator with respect to an unbiased measurement be

[TABLE]

It can be shown (Holevo, 2011, Sec. 6.2) that $\delta\in\mathcal{D}$ (as long as $\lVert\delta\rVert<\infty$ ), and also that $\lVert\delta\rVert^{2}$ bounds the estimation error as

[TABLE]

A generalized Helstrom bound (GHB) for any unbiased measurement, denoted by $\tilde{\mathsf{H}}$ , can then be expressed as

[TABLE]

We call an unbiased measurement efficient if it has an error that achieves the GHB, following the common statistics terminology Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006).

Proofs that Eq. (31) is equal to Eq. (15) if $p<\infty$ and $K^{-1}$ exists can be found in Refs. Nagaoka (1989); Amari and Nagaoka (2000); Ragy et al. (2016). The following theorem gives a more general expression that is the cornerstone of quantum semiparametric estimation.

Theorem 1.

[TABLE]

where $\delta_{\rm eff}$ , henceforth called the efficient influence, is the unique element in the influence-operator set $\mathcal{D}$ given by

[TABLE]

and $\Pi(\delta|\mathcal{T})$ denotes the projection of any influence operator $\delta\in\mathcal{D}$ into the tangent space $\mathcal{T}$ .

Proof.

The proof is similar to the classical one Bickel et al. (1993); Tsiatis (2006). First note that, since $\mathcal{D}\subseteq\mathcal{Z}=\mathcal{T}\oplus\mathcal{T}^{\perp}$ , any $\delta\in\mathcal{D}$ can always be decomposed into

[TABLE]

This implies $\langle S,\delta_{\rm eff}\rangle=\langle S,\delta-h\rangle=\langle S,\delta\rangle=\partial\beta$ , and therefore $\delta_{\rm eff}\in\mathcal{D}$ . Now the Pythagorean theorem gives

[TABLE]

which results in Eq. (32).

To prove the uniqueness of $\delta_{\rm eff}$ in $\mathcal{D}$ , suppose that there exists another $\delta^{\prime}\in\mathcal{D}$ that gives $\lVert\delta^{\prime}\rVert=\lVert\delta_{\rm eff}\rVert$ . Define $g=\delta^{\prime}-\delta_{\rm eff}$ . Since $\langle S,g\rangle=\langle S,\delta^{\prime}\rangle-\langle S,\delta_{\rm eff}\rangle=\partial\beta-\partial\beta=0$ , $g\in\mathcal{T}^{\perp}$ , and the Pythagorean theorem yields $\lVert\delta^{\prime}\rVert^{2}=\lVert\delta_{\rm eff}\rVert^{2}+\lVert g\rVert^{2}$ . This implies that $\lVert g\rVert=0$ and $g=0$ , contradicting the assumption that $\delta^{\prime}\neq\delta_{\rm eff}$ . Hence $\delta_{\rm eff}$ must be unique, and $\Pi(\delta|\mathcal{T})$ for any $\delta\in\mathcal{D}$ results in the same $\delta_{\rm eff}$ . ∎

Figure 1 illustrates all the Hilbert-space concepts involved in Theorem 1.

Before we apply the theorem to examples, we list a couple of important corollaries. The first corollary reproduces the original Helstrom bound given by Eq. (15) and is expected from earlier derivations; see, for example, Ref. (Hayashi, 2005, Eq. (20) in Chap. 18) and Ref. (Amari and Nagaoka, 2000, Eq. (7.93)). Here we simply clarify that it is a special case of Theorem 1.

Corollary 1.

If $p<\infty$ and $K^{-1}=\langle S,S\rangle^{-1}$ exists, the GHB is equal to the original Helstrom bound given by Eq. (15).

Proof.

Delegated to Appendix A. ∎

Note that, unlike Eq. (15), which assumes that $S$ consists of linearly independent operators and $K$ is invertible, Theorem 1 works with no regard for any linear dependence in $S$ . This generalization is in fact indispensable to the semiparametric theory, especially when the concept of parametric submodels is introduced in Sec. IV.

The second corollary, which gives a scaling of the bound with the number of object copies and is easy to prove via $K^{-1}$ , requires more effort to prove if $K^{-1}$ is to be avoided.

Corollary 2.

For a family of density operators that model $N$ independent and identical quantum objects in the form of

[TABLE]

where the tensor power is defined as the tensor product

[TABLE]

the efficient influence and the GHB are given by

[TABLE]

where $U$ is a map defined as

[TABLE]

Proof.

Delegated to Appendix B. ∎

III.4 Influence

operator via a functional gradient

Theorem 1 is useful if an influence operator $\delta\in\mathcal{D}$ can be found and $\Pi(\delta|\mathcal{T})$ is tractable. One way of deriving an influence operator is to assume that the parameter of interest is a functional $\beta[\rho]$ and consider a derivative of $\beta[\rho]$ in the “direction” of an operator $h$ given by

[TABLE]

Assume that the directional derivative can be expressed as

[TABLE]

in terms of a $\tilde{\beta}\in\mathcal{Y}$ , hereafter called a gradient of $\beta[\rho]$ . Any ordinary partial derivative of $\beta$ becomes

[TABLE]

Projecting the gradient into $\mathcal{Z}$ then gives an influence operator, viz.,

[TABLE]

as it is straightforward to check that $\langle\delta,I\rangle=0$ and $\langle S,\delta\rangle=\partial\beta$ . The top flowchart in Fig. 2 illustrates the steps to obtain $\delta$ from $\beta[\rho]$ . $\tilde{\beta}$ , $\delta$ , and $\delta_{\rm eff}$ are all gradients that satisfy Eq. (41); the difference lies in the set of directions to which each is restricted. $\delta$ , for instance, is restricted to $\mathcal{Z}$ and orthogonal to $\mathcal{Z}^{\perp}$ , while $\delta_{\rm eff}$ is restricted to $\mathcal{T}$ and orthogonal to $\mathcal{T}^{\perp}$ 111More precisely, $\tilde{\beta}$ is the unique Riesz-Fréchet representation Reed and Simon (1980) of $D_{h}\beta$ as a continuous linear functional of $h\in\mathcal{Y}$ , $\delta$ is that for $h\in\mathcal{Z}\subset\mathcal{Y}$ , and $\delta_{\rm eff}$ is that for $h\in\mathcal{T}\subseteq\mathcal{Z}\subset\mathcal{Y}$ Reed and Simon (1980); Bickel et al. (1993). The existence of each relies on $D_{h}\beta$ being continuous with respect to $h$ in each domain, so the existence of $\tilde{\beta}$ implies that of $\delta$ and $\delta_{\rm eff}$ ..

Now consider some examples. The first is $\beta=\operatorname{tr}\rho Y$ for a given (i.e., $\theta$ -independent) observable $Y$ , which leads to

[TABLE]

The second example is the purity $\beta=\operatorname{tr}\rho^{2}$ , which leads to

[TABLE]

The final example is the relative entropy $\beta=\operatorname{tr}\rho(\ln\rho-\ln\sigma)$ Hayashi (2017); Holevo (2012). where $\ln\rho=\sum_{j}(\ln\lambda_{j})\ket{e_{j}}\bra{e_{j}}$ and $\sigma$ is a given density operator with $\operatorname{supp}(\sigma)\supseteq\operatorname{supp}(\rho)$ . The differentiability of $\beta$ is not a trivial question when $d=\infty$ Holevo (2012), but for $d<\infty$ it can be done to give

[TABLE]

where $D_{h}\beta$ uses the fact that $\operatorname{tr}\rho[\ln(\rho+\epsilon\rho\circ h)-\ln\rho]$ is second order in $\epsilon$ for any $h\in\mathcal{Z}$ (Hayashi, 2017, Theorem 6.3). The von Neumann entropy is a simple variation of this example.

III.5 Projection into the tangent space

The next step is $\Pi(\delta|\mathcal{T})$ . If the family of density operators is large enough, $\mathcal{T}$ can fill the entire $\mathcal{Z}$ and the projection becomes trivial. We call a family full-dimensional if its tangent space at each $\rho$ satisfies

[TABLE]

For a specific example, consider the orthonormal basis of $\mathcal{H}$ given by Eq. (9) and the most general parametrization of $\rho$ for $d<\infty$ given by Kahn and Guţă (2009)

[TABLE]

where

[TABLE]

and a special entry $\theta_{a0}$ is removed from the parameters and set as $\theta_{a0}=1-\sum_{j\neq 0}\theta_{aj}$ , such that $\operatorname{tr}\rho(\theta)=\sum_{j}\theta_{aj}=1$ and

[TABLE]

$\partial\rho$ is then given by

[TABLE]

The next theorem is a key step in deriving simple analytic results.

Theorem 2.

The $\mathbf{F}_{0}$ family is full-dimensional.

Proof.

Consider the solution to $\langle S,h\rangle=0$ for an $h\in\mathcal{Z}$ . All operators are bounded if $d<\infty$ . We can then use Eqs. (25) and (53) to obtain

[TABLE]

where $\hat{h}$ is any operator in the equivalence class of $h$ . Thus all the diagonal entries of $\hat{h}$ are equal to $\bra{0}\hat{h}\ket{0}$ , and all the off-diagonal entries are zero. In other words, $\hat{h}=\bra{0}h\ket{0}\hat{I}$ , where $\hat{I}$ is the identity operator. But $h\in\mathcal{Z}$ also means that $\operatorname{tr}\rho\hat{h}=\bra{0}\hat{h}\ket{0}=0$ , resulting in $\hat{h}=0$ as the only solution. Hence $\mathcal{T}^{\perp}=\{0\}$ contains only the zero element, and $\mathcal{T}=\mathcal{Z}$ .

∎

$\mathbf{F}_{0}$ implies that the experimenter knows nothing about the density operator, apart from the Hilbert space $\mathcal{H}$ on which it operates. Despite the high dimension of the family, Theorems 1 and 2 turn the problem into a trivial exercise once an influence operator has been found, since a $\delta\in\mathcal{D}\subseteq\mathcal{Z}$ is already in $\mathcal{Z}=\mathcal{T}$ and hence efficient. Corollary 2 can then be used to extend the result for $N$ copies. For $\beta=\operatorname{tr}\rho Y$ , Eq. (44) leads to

[TABLE]

This implies that a von Neumann measurement of $Y$ of each copy and taking the sample mean of the outcomes are already efficient; no other measurement can do better in terms of unbiased estimation. For $\beta=\operatorname{tr}\rho^{2}$ , Eq. (45) leads to

[TABLE]

and for $\beta=\operatorname{tr}\rho(\ln\rho-\ln\sigma)$ , Eq. (46) leads to

[TABLE]

Intriguingly, this expression coincides with the information variance that has found uses in other contexts of quantum information theory, such as quantum hypothesis testing Tomamichel and Hayashi (2013); *li14a; *tomamichel16.

Deriving Eqs. (57)–(59) via the conventional brute-force method would entail the following steps:

Assume the $\mathbf{F}_{0}$ family of density operators given by Eq. (48), with $p=d^{2}-1$ parameters. 2. 2.

Compute the $p$ score operators via Eq. (11). 3. 3.

Compute the $p$ -by- $p$ Helstrom information matrix $K$ via Eq. (16). 4. 4.

Compute the inverse $K^{-1}$ . 5. 5.

Compute $\beta(\theta)$ via Eq. (48), $\partial\beta(\theta)$ , and the Helstrom bound via Eq. (15).

While this method has been used to produce Eq. (57) Watanabe et al. (2010), it is less clear whether it can easily give Eq. (58) or Eq. (59). Contrast the brute-force method with the proposal here:

Compute the influence operator $\delta$ via a functional derivative of $\beta[\rho]$ according to Sec. III.4. 2. 2.

Find the tangent space $\mathcal{T}$ of the density-operator family or the orthocomplement $\mathcal{T}^{\perp}$ . For example, Theorem 2 shows that $\mathcal{T}$ is full-dimensional for the family of arbitrary density operators, while Sec. V later shows that $\mathcal{T}^{\perp}$ may remain tractable for smaller families. 3. 3.

Compute $\delta_{\rm eff}=\Pi(\delta|\mathcal{T})=\delta-\Pi(\delta|\mathcal{T}^{\perp})$ and $\tilde{\mathsf{H}}=\lVert\delta_{\rm eff}\rVert^{2}=\operatorname{tr}\rho\delta_{\rm eff}^{2}$ .

Each step is tractable for all the examples here, regardless of the dimensions.

Equations (57)–(59) are the quantum bounds promised in Sec. II, although they are merely the simplest examples of what the semiparametric methodology can offer, as Secs. V–VII later show.

IV Parametric submodels

The proof of Theorem 2 works only in the finite-dimensional case ( $p=d^{2}-1<\infty$ ). For infinite-dimensional problems, the beautiful concept of parametric submodels Stein (1956); Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006) offers a more rigorous approach. Let

[TABLE]

be a “mother” density-operator family, where $\mathcal{G}$ may be an infinite-dimensional space. The density operators are still assumed to operate on a common separable Hilbert space $\mathcal{H}$ . Denote the true density operator in the family as $\rho$ . A parametric submodel $\mathbf{F}^{\sigma}$ is defined as any subset of $\mathbf{G}$ that contains the true $\rho$ and has the parametric form of Eq. (8). To wit,

[TABLE]

where $s$ denotes the dimension of the parameter and $\phi$ denotes the parameter value at which $\sigma(\phi)=\rho$ is the truth; both may be specific to the submodel. In the language of geometry Hayashi (2017, 2005); Uhlmann (1993), each $\mathbf{F}^{\sigma}$ is an $s$ -dimensional surface in $\mathbf{G}$ , and all the surfaces are required to intersect at $\rho$ . Figure 3 illustrates the concept.

Each submodel $\mathbf{F}^{\sigma}$ is assumed to be smooth enough for scores to be defined in the same way as before by

[TABLE]

which denotes a system of $s$ equations given by

[TABLE]

As everything is evaluated at the true $\rho$ , the scores of all submodels in fact live in the same Hilbert space $\mathcal{Z}$ with respect to $\rho$ . Let the set of all parametric submodels of $\mathbf{G}$ with respect to the truth be

[TABLE]

where $\mathcal{S}$ denotes the set of indices that label all the submodels. Define the tangent set as the set of the scores from all such parametric submodels of $\mathbf{G}$ , viz.,

[TABLE]

and the tangent space as the span of the set, viz.,

[TABLE]

An influence operator is now defined as any operator that satisfies the unbiasedness condition for all submodels with respect to $\{S\}$ . The condition can be expressed as

[TABLE]

where $(\partial\beta)_{\theta=\phi}$ is specific to each submodel. If $\langle S,\delta\rangle=\partial\beta$ in Eq. (28) is taken to mean Eq. (67), then the influence-operator set $\mathcal{D}$ is still defined by Eq. (28). The error operator given by Eq. (29) for an unbiased measurement still satisfies Eq. (67) by the generic arguments in Ref. (Holevo, 2011, Sec. 6.2), which apply to any submodel, so the error operator remains in $\mathcal{D}$ , and Eq. (31) still holds. Theorem 1 can now be extended for the mother family.

Theorem 3.

The GHB in Eq. (31) for the mother family $\mathbf{G}$ is given by

[TABLE]

where the efficient influence $\delta_{\rm eff}$ is the unique element in the influence-operator set $\mathcal{D}$ given by

[TABLE]

$\delta$ * is any influence operator in $\mathcal{D}$ , and $\mathcal{T}$ is the tangent space spanned by the scores of all parametric submodels of $\mathbf{G}$ .*

Proof.

The proof is identical to that of Theorem 1 if one takes $\{S\}$ to be the tangent set containing the scores of all parametric submodels. ∎

Corollary 2 can also be generalized in an almost identical way, although the proof requires more careful thought.

Corollary 3.

For a family of density operators that model $N$ independent and identical quantum objects in the form of

[TABLE]

the efficient influence and the GHB are given by

[TABLE]

where $\delta_{\rm eff}^{(1)}$ and $\tilde{\mathsf{H}}^{(1)}$ are those for the $N=1$ family according to Theorem 3 and $U$ is the map given by Eq. (39).

Proof.

Delegated to Appendix C. ∎

We now generalize Theorem 2 for infinite-dimensional systems. This is also a more precise generalization of a classic result in semiparametric theory (Bickel et al., 1993, Example 1 in Sec. 3.2).

Theorem 4.

$\mathbf{G}_{0}$ , defined as the family of arbitrary density operators, is full-dimensional.

Proof.

We call a Hilbert-space element in $\mathcal{Y}$ bounded and denote it by $\lVert h\rVert_{\rm op}<\infty$ if its equivalence class contains a bounded operator $\hat{h}$ . Denote the set of all bounded elements in $\mathcal{Z}$ as

[TABLE]

Take any $h\in\mathcal{B}$ and its bounded operator $\hat{h}$ . Construct a scalar-parameter exponential family as Hayashi (2017, 2005)

[TABLE]

where $\theta\in\mathbb{R}$ and the truth is at $\sigma(0)=\rho$ . As $\hat{h}$ is bounded, $\exp(\theta\hat{h}/2)$ is bounded and strictly positive. As $\rho$ is nonnegative and unit-trace, $\kappa(\theta)$ is nonnegative and trace-class (Holevo, 2011, Theorem 2.7.2). Moreover, $\operatorname{tr}\kappa(\theta)$ satisfies the properties

[TABLE]

because $\kappa(\theta)$ is trace-class and $\exp(\theta\hat{h})$ is strictly positive. Hence $\sigma(\theta)$ is a valid density operator at any $\theta$ . Since $\mathbf{G}_{0}$ contains arbitrary density operators, $\mathbf{F}^{\sigma}=\{\sigma(\theta):\theta\in\mathbb{R},\sigma(0)=\rho\}$ is a parametric submodel of $\mathbf{G}_{0}$ . It is straightforward to show that

[TABLE]

so the score for this model can be taken as $S^{\sigma}=h$ .

Define a submodel in the same way for every $h\in\mathcal{B}$ , such that all of the $\mathcal{B}$ elements are in the tangent set $\{S\}$ , leading to $\mathcal{B}\subseteq\{S\}\subseteq\mathcal{T}$ . As $\mathcal{T}$ is closed, the limit points of $\mathcal{B}$ must also be in $\mathcal{T}$ , and $\overline{\mathcal{B}}\subseteq\mathcal{T}$ , where $\overline{\mathcal{B}}$ is the closure of $\mathcal{B}$ . Lemma 2 in Appendix D states that $\mathcal{B}$ is a dense subset of $\mathcal{Z}$ , so

[TABLE]

Together with the fact $\mathcal{T}\subseteq\mathcal{Z}$ , this implies $\mathcal{T}=\mathcal{Z}$ , and the theorem is proved.

∎

A comparison of the proofs of Theorems 2 and 4 shows how the parametric-submodel concept works. Instead of dealing with one large family such as Eq. (48), here one exploits the freedom offered by $\mathbf{G}_{0}$ to specify many ad-hoc and elementary submodels. Each submodel in the proof cannot be simpler—the exponential family is simply a type of geodesics through $\rho$ in density-operator space Hayashi (2017). In fact, we do not have to use the exponential family, and other families may also be used as long as they fit the purpose of the proof. An enormous number of submodels are introduced, one for each $\mathcal{B}$ element in the proof, leading to an extremely overcomplete tangent set. But that presents no trouble for the geometric approach; only the resultant tangent space matters at the end. Figure 4 illustrates the idea.

By virtue of Theorem 4, an influence operator $\delta\in\mathcal{D}\subseteq\mathcal{Z}=\mathcal{T}$ found for a parameter of interest is the efficient one for $\mathbf{G}_{0}$ . The examples in Secs. III.4 and III.5 work for $\mathbf{G}_{0}$ in the same way they work for $\mathbf{F}_{0}$ . If $\beta$ is given by $\beta[\rho]$ , an influence operator that satisfies Eq. (67) can be found via a gradient of $\beta[\rho]$ , as shown in Sec. III.4 and Fig. 2. In particular, the influence operators given by Eqs. (44)–(46) and the bounds given by Eqs. (57)–(59) for the various examples should still hold for $\mathbf{G}_{0}$ , although the entropy example may require a more rigorous treatment when $d=\infty$ Holevo (2012).

V Constrained bounds

V.1 Antiscore operators

Consider a constrained family of density operators defined as

[TABLE]

where $\gamma[\rho(g)]=0$ denotes a finite set of equality constraints $\{\gamma_{k}[\rho(g)]=0:k=1,\dots,r\}$ . Such constraints appear often in quantum thermodynamics Jaynes (1957); Gogolin and Eisert (2016). If there exist gradient operators $\{\tilde{\gamma}_{k}\in\mathcal{Y}\}$ such that, for any $h\in\mathcal{Y}$ ,

[TABLE]

then each operator given by

[TABLE]

satisfies

[TABLE]

and the constraint $\gamma[\rho(g)]=0$ implies that $\partial\gamma_{k}[\rho]=\left\langle S^{\sigma},R_{k}\right\rangle=0$ for all submodels and $k$ . In short, we write

[TABLE]

Thus $\{R\}$ is orthogonal to the tangent set $\{S\}$ and $\operatorname{span}\{R\}$ must be a subset of $\mathcal{T}^{\perp}$ . We call $R$ the antiscore operators, as the following theorem shows that they span $\mathcal{T}^{\perp}$ in the same way the scores span $\mathcal{T}$ .

Theorem 5.

If $\langle R,R\rangle^{-1}$ exists, $\mathcal{T}^{\perp}=\operatorname{span}\{R\}$ for the $\mathbf{G}_{\gamma}$ family.

Proof.

The proof again follows the classical case (Bickel et al., 1993, Example 3 in Sec. 3.2). Let

[TABLE]

In view of Eq. (81),

[TABLE]

Now construct a parametric submodel $\mathbf{F}^{\sigma}$ in terms of each $h\in\mathcal{R}^{\perp}$ as

[TABLE]

where $\theta\in\mathbb{R}$ , $g=w^{\top}R\in\mathcal{R}$ is an operator to be specified later, and $f(u)$ is defined with respect to the spectral representation of $u=\int\lambda dE_{u}(\lambda)$ as

[TABLE]

$f(u)$ is bounded and positive even if $u$ is unbounded, so $\sigma(\theta)$ is a valid density operator. Since $\rho\in\mathbf{G}_{\gamma}$ , $\gamma[\rho]=0$ . For a $\sigma(\theta)$ away from $\rho$ with $\theta\neq 0$ ,

[TABLE]

where Eq. (87) uses Eq. (80) and the last step uses the fact $h\in\mathcal{R}^{\perp}$ . To make $\sigma(\theta)$ satisfy the constraint $\gamma[\sigma(\theta)]=0$ , $g(\theta)=w(\theta)^{\top}R$ can be set as a function of $\theta$ to cancel the $o(\theta)$ term, with

[TABLE]

Then $\gamma[\sigma(\theta)]=0$ and $\mathbf{F}^{\sigma}$ is a valid parametric submodel of $\mathbf{G}_{\gamma}$ . Equation (89) also implies that $\theta g(\theta)=o(\theta)$ is negligible relative to $\theta h$ for infinitesimal $\theta$ , so the score for $\mathbf{F}^{\sigma}$ is $h$ , which should be put in the tangent set $\{S\}$ . As this procedure can be done for any $h\in\mathcal{R}^{\perp}$ , $\mathcal{R}^{\perp}\subseteq\{S\}\subseteq\mathcal{T}$ . Together with Eq. (83), this leads to $\mathcal{T}=\mathcal{R}^{\perp}$ , giving $\mathcal{T}^{\perp}=\mathcal{R}$ .

∎

The family given by Eqs. (84) and (85) is more convenient to use here than the exponential family used in the proof of Theorem 4. The $f(u)$ defined by Eq. (85) is a generalization of the classical version in Ref. (Bickel et al., 1993, Example 1 in Sec. 3.2) and plotted in Fig. 5. It is designed to give a valid density operator via Eqs. (84)—even if the argument is an unbounded operator—yet produce the desired score when linearized at $\theta=0$ . An adjustable operator $g(\theta)$ is included in the submodel to make $\sigma(\theta)$ satisfy the constraint away from $\rho$ . Figure 6 further illustrates the idea of the proof.

Given an influence operator $\delta$ , such as those derived in Sec. III.4, the efficient influence and the GHB can be computed in terms of $\mathcal{T}^{\perp}$ instead of $\mathcal{T}$ via

[TABLE]

The same projection formula that gives $\delta_{\rm eff}$ in Appendix A can be adapted to give

[TABLE]

Equations (92) and (93) remain tractable if the constraints are few. The gradients of $\gamma[\rho]$ can be derived in the same way as those of $\beta[\rho]$ , as shown in Fig. 2, and $R$ can be computed analytically for linear constraints, the purity constraint, and the entropy constraint by following the same type of calculations shown in Eqs. (44)–(46). Equation (4) is a special example of the constrained GHB when $\beta=\operatorname{tr}\rho Y$ and $\gamma=\operatorname{tr}\rho(Z-\zeta)=0$ .

V.2 Entropy estimation in quantum thermodynamics

In quantum thermodynamics, conserved quantities of a dynamical system, such as the energy and the particle number, are expressed as moment constraints on the density operator with respect to a vector of observables $Z$ and a vector of constants $\zeta$ , viz.,

[TABLE]

Given such constraints, the density operator is often assumed to be the one with the maximum entropy Jaynes (1957), known as the generalized Gibbs ensemble Gogolin and Eisert (2016). Such an assumption, however, requires verification and does not hold out of equilibrium. Experiments on Bose gases have been performed to study the quantum states at different times and the validity of the maximum-entropy principle at steady state Kinoshita et al. (2006); Langen et al. (2015a, b).

When the maximum-entropy principle is in question for those experiments, it is prudent to make no prior assumption about the density operator other than the constraints. Thus one should consider a family of density operators given by Eq. (77), where the vectoral constraint is $\gamma[\rho]=\operatorname{tr}\rho(Z-\zeta)=0$ . Suppose that the von Neumann entropy $\beta=-\operatorname{tr}\rho\ln\rho$ is the parameter of interest. The estimation of $\beta$ is then a problem of quantum semiparametric estimation.

As the experiments typically involve high-dimensional systems, quantum state tomography is impractical. More efficient estimation of $\beta$ should exist. The formalism here leads to a quantum limit given by

[TABLE]

This bound is equivalent to the Holevo bound, as shown in Sec. VIII, so it is asymptotically attainable in principle, at least for finite-dimensional systems Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020), although the experimental implementation of efficient measurements remains an open question.

As entropy is an excellent measure of randomness and a central quantity in information theory, entropy estimation has many applications beyond thermodynamics. In classical statistics, the semiparametric estimation of entropic quantities is a well studied problem with known near-efficient estimators and applications in universal coding, statistical tests, random-number generation, econometrics, spectroscopy, and even neuroscience Beirlant et al. (1997); *paninski03; *cover. In the quantum domain, one application is universal quantum information compression Jozsa et al. (1998): knowing just the von Neumann entropy and nothing else about $\rho$ allows the quantum information to be compressed in accordance with the entropy. Another application is the estimation of an entropic measure of entanglement, which allows one to demonstrate entanglement without full tomography Gühne and Tóth (2009). The quantum limit here quantifies the minimum amount of resources needed to achieve a desired precision. Its asymptotic attainability suggests that it is a lofty but fair yardstick for experimental design.

V.3 Philosophy

The proposed approach to quantum semiparametric bounds is the polar opposite of the usual approach in quantum metrology. In the usual bottom-up approach, one assumes a small family of density operators with a few parameters and computes $\lVert\Pi(\delta|\mathcal{T})\rVert^{2}$ that is determined by the overlap between $\delta$ and the scores $S$ . Here, one starts with a large family with almost full dimension, computes $\lVert\delta\rVert^{2}$ for an amenable $\delta$ , and then reduces it by $\lVert\Pi(\delta|\mathcal{T}^{\perp})\rVert^{2}$ that is determined by the overlap between $\delta$ and the antiscores $R$ , as illustrated by Fig. 7. The complexity of the problem thus depends on the dimension of the family, and the essential insight of this work is that the problem can become simple again when the dimension is close to being full. Of course, if the dimension of $\mathcal{T}^{\perp}$ is high, the top-down approach may also suffer from the curse of dimensionality. The medium families with both $\mathcal{T}$ and $\mathcal{T}^{\perp}$ in high dimensions are the most difficult to deal with, as they may be impregnable from either end.

V.4 Looser bounds

It may often be the case that, despite one’s best efforts, the exact $\delta_{\rm eff}$ for a problem remains intractable. Then a standard strategy in statistics and quantum metrology is to sandwich $\lVert\delta_{\rm eff}\rVert^{2}$ between upper and lower bounds. $\lVert\delta\rVert^{2}$ is an obvious upper bound and can be obtained from the gradient method in Sec. III.4 if $\beta$ can be expressed as a functional $\beta[\rho]$ . Another way is to use Eq. (30) if an unbiased measurement and its error are known. The evaluation of lower bounds, on the other hand, can be facilitated by the following proposition.

Proposition 1.

Let $\mathcal{V}\subseteq\mathcal{T}$ be a closed subspace of $\mathcal{T}$ and $\mathcal{V}^{\perp}$ be the orthocomplement of $\mathcal{V}$ in $\mathcal{Z}$ . Then

[TABLE]

In particular, if

[TABLE]

is taken as the tangent space for a particular parametric submodel $\mathbf{F}^{\sigma}$ , then

[TABLE]

is the GHB for that submodel.

Proof.

Delegated to Appendix E. ∎

A tight lower bound on $\lVert\delta_{\rm eff}\rVert^{2}$ can be sought by devising a submodel that is as unfavorable to the estimation of $\beta$ as possible. Another approach is to devise an overconstrained model with $\mathcal{V}^{\perp}\supseteq\mathcal{T}^{\perp}$ and evaluate a lower bound on $\lVert\delta_{\rm eff}\rVert^{2}$ from the top by overshooting, as illustrated by Fig. 8.

VI Examples in optics

VI.1 Quadrature estimation

Here we further illustrate the theory with examples in optics, where quantum measurement theory has found the most experimental success Wiseman and Milburn (2010). For the first and simplest example, let $\rho$ be a density operator of an optical mode and assume the $\mathbf{G}_{0}$ family of arbitrary density operators. Consider the estimation of the mean of a quadrature operator $Y$ , with $\beta=\operatorname{tr}\rho Y$ . This problem appears often in optical state characterization, communication, and sensing, where $\beta$ is a displacement parameter Paris and Rehacek (2004). The GHB is given by Eq. (57), and homodyne detection of $Y$ is efficient. Note that this example is different from all previous studies of quadrature estimation Helstrom (1976); Holevo (2011), which assume Gaussian states or similarly low-dimensional parametric models. The semiparametric scenario here allows $\rho$ to be arbitrary and possibly non-Gaussian.

Now suppose that side information $\operatorname{tr}\rho Z=\zeta$ concerning another quadrature $Z$ is available. It follows from Sec. V that the efficient influence is now

[TABLE]

where $C_{YZ}$ and $V_{Z}$ are given by Eqs. (5). The GHB is then given by Eq. (4), which is lowered by any correlation between $Y$ and $Z$ . From the efficient influence, one may use Eq. (29) to find an efficient measurement, which obeys

[TABLE]

This can be satisfied if the POVM measures the quadrature $Y-(C_{YZ}/V_{Z})Z$ instead of the obvious $Y$ . Notice, however, that $C_{YZ}/V_{Z}$ depends on the unknown $\rho$ . Whether adaptive measurements Wiseman and Milburn (2010) can implement this POVM approximately and whether asymptotic attainability is possible for this infinite-dimensional problem are interesting open questions. One approach may be to form rough estimates of the covariances $C_{YZ}$ and $V_{Z}$ via heterodyne detection of a portion of the light first, and then measure the desired quadrature via homodyne detection based on the approximate $C_{YZ}/V_{Z}$ .

VI.2 Family of classical states

For a more nontrivial example, consider a density-operator family in the form

[TABLE]

where $\alpha=\alpha^{\prime}+i\alpha^{\prime\prime}\in\mathbb{C}$ , $d^{2}\alpha=d\alpha^{\prime}d\alpha^{\prime\prime}$ , $\ket{\alpha}$ is a coherent state, and $P$ is the Glauber-Sudarshan function Mandel and Wolf (1995). As $P$ is assumed to be positive, $\mathbf{G}_{c}$ is a family of classical states Mandel and Wolf (1995) and a strict subset of $\mathbf{G}_{0}$ . The assumption of $\mathbf{G}_{c}$ instead of $\mathbf{G}_{0}$ is more appropriate for practical applications with significant decoherence, as nonclassical states are unlikely to survive in such an environment.

Consider a moment parameter of the form

[TABLE]

where $f(\alpha,\alpha^{*})$ is a real polynomial of $\alpha$ and $\alpha^{*}$ . For example, one may be interested in the mean of a quadrature, in which case $f=\alpha\exp(-i\theta)+\alpha^{*}\exp(i\theta)$ , or the mean energy, in which case $f=|\alpha|^{2}$ . The optical equivalence theorem Mandel and Wolf (1995) gives

[TABLE]

where $:f(a,a^{\dagger}):$ denotes the normal ordering Mandel and Wolf (1995). It follows from Sec. III.4 that an influence operator is $\delta=Y-\beta$ .

The next step is to find the tangent space of $\mathbf{G}_{c}$ . Although $\mathbf{G}_{c}$ is a smaller family than $\mathbf{G}_{0}$ , its dimension turns out to be just as high.

Proposition 2.

$\mathbf{G}_{c}$ * is full-dimensional.*

Proof.

Delegated to Appendix F. ∎

With the full-dimensional tangent space, the GHB is also given by Eq. (57). This result shows that the obvious von Neumann measurement of $Y$ remains efficient in estimating $\beta$ , and no alternative measurements can do better, despite restricting the family to classical states. For example, if $f(\alpha,\alpha^{*})$ is a quadrature, then the homodyne measurement is efficient, and if $f(\alpha,\alpha^{*})=|\alpha|^{2}$ , then $:f(a,a^{\dagger}):=a^{\dagger}a$ , and the photon-number measurement is efficient.

$\mathcal{G}$ , the space of positive densities, is infinite-dimensional. The estimation of $P$ would be a nonparametric problem Artiles et al. (2005), in contrast with the semiparametric problems studied here. In classical statistics, it is known that a nonparametric estimation of the probability density cannot achieve a parametric convergence rate ( $\mathsf{E}=O(1/N)$ ) Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsybakov (2009), and this difficulty is expected to translate to the quantum domain. Semiparametric estimation, on the other hand, can achieve the parametric rate and is the more feasible task if one is interested in only a few parameters of the system.

A further restriction on the family of $P$ can give very different results, as shown in the next section in the context of incoherent imaging.

VI.3 Incoherent imaging

VI.3.1 The mother model

Here we summarize existing results concerning the problem of incoherent imaging Tsang (2019a) using the language of semiparametrics. Unlike previous sections, this section presents no new results essentially. Rather, the goal is to use this very important but equally difficult problem to illustrate the concepts and current limitations of the quantum semiparametric theory.

The basic setup of an imaging system is depicted in Fig. 9. The object is assumed to emit spatially incoherent light at an optical frequency. For simplicity, the imaging system is assumed to be one-dimensional, paraxial, and diffraction-limited. A model of each photon on the image plane is Tsang (2017, 2019b, 2019a)

[TABLE]

where $F$ is the unknown source density, $\mathcal{G}_{1}$ is a set of probability densities on $\mathbb{R}$ , $X\in\mathbb{R}$ is the object-plane coordinate, $\psi(x)$ is the point-spread function of the imaging system, $x\in\mathbb{R}$ is the image-plane coordinate normalized with respect to the magnification factor Goodman (2004), $\ket{x}$ is the Dirac position ket that satisfies $\braket{x}{x^{\prime}}=\delta(x-x^{\prime})$ , and $k$ is the canonical momentum operator. $X$ and $x$ are further assumed to be normalized with respect to the width of $\psi(x)$ so that they are dimensionless. $\psi(x)$ is assumed here to be

[TABLE]

such that $\ket{\psi_{X}}=\ket{\alpha=X/2}$ is a coherent state. Various generalizations can be found in Refs. Tsang (2017, 2019b, 2019a, 2019c) and references therein. Besides imaging, the model can also be used to describe a quantum particle under random displacements Hall et al. (2009); *vidrighin; *branford19; Ng et al. (2016).

The problem is semiparametric if $\mathcal{G}_{1}$ is infinite-dimensional, such as

[TABLE]

and the parameter of interest is a functional of $F$ , such as the object moment

[TABLE]

where $\mu\in\mathbb{N}_{1}$ denotes the order of the moment of interest. Notice that the family indicated by Eq. (109) is much smaller than the one given by Eq. (103) in the previous example, as the Glauber-Sudarshan function is now separable in terms of $(\alpha^{\prime},\alpha^{\prime\prime})$ and confined to the real axis of $\alpha$ , viz.,

[TABLE]

In fact, the dimension of $\mathcal{T}^{\perp}$ is now infinite, as shown in Appendix G, so this problem is the most difficult type described in Sec. V.3.

The errors and their bounds are all functionals of the true density $F$ , and we will focus on their values for subdiffraction distributions, which are defined as those with a width $\Delta$ around $X=0$ much smaller than the point-spread-function width, or in other words $\Delta\ll 1$ Tsang (2019a).

VI.3.2 Semiparametric measurements and estimators

Two globally unbiased measurements for semiparametric moment estimation are known Tsang (2019c). For $N$ detected photons 222 Refs. Tsang et al. (2016); Tsang (2017, 2019c) use the symbol $L$ for the number of detected photons, which is stochastic, and $N$ for the expected number of detected photons. For optics models with Poisson statistics, the conditioning on the detected photon number does not introduce any significant difference to the theory., both are separable measurements and sample means in the form of 333In practice, a histogram of the photon counts at the detectors provides sufficient statistics for the estimators and the photons do not need to be resolved individually Tsang (2019c).

[TABLE]

The first measurement is direct imaging, which measures the intensity on the image plane and is equivalent to the projection of each photon in the position basis as

[TABLE]

An unbiased semiparametric estimator is given by the sample mean of

[TABLE]

and the error is

[TABLE]

where $O(1)$ denotes a prefactor that does not scale with $\Delta$ in the first order. The second measurement is the so-called spatial-mode demultiplexing or SPADE Tsang et al. (2016); Tsang (2017, 2019b, 2019a, 2019c), which demultiplexes the image-plane light in the Hermite-Gaussian basis given by

[TABLE]

where $\operatorname{He}_{m}(x)$ is a Hermite polynomial Olver et al. (2010). For the estimation of an even moment with $\mu=2j$ , the POVM for each photon is

[TABLE]

an unbiased semiparametric estimator is given by the sample mean of

[TABLE]

and the error is

[TABLE]

which is much lower than that of direct imaging in the subdiffraction regime for the second and higher moments. For the estimation of odd moments with SPADE, only approximate results have been obtained so far Tsang (2017, 2018); Zhou and Jiang (2019); Bonsma-Fisher et al. (2019) and are not elaborated here.

Both estimators are efficient for their respective measurements in the classical sense Tsang (2019c). In the quantum case, the question is whether SPADE is efficient or there exist even better measurements. Computing the GHB, or at least bounding it, would answer the question and establish the fundamental quantum efficiency for incoherent imaging.

VI.3.3 Lower bounds via parametric submodels

Both Eqs. (118) and (123) are upper bounds on the GHB. By virtue of Proposition 1, all earlier quantum lower bounds derived for incoherent imaging via parametric models are in fact lower bounds on the GHB for the mother family given by Eq. (106), with the true $\rho$ being evaluated at certain special cases of $F$ . References Tsang et al. (2016); Bisketzi et al. (2019); *lupo20a, for example, assume discrete point sources, but exact results become difficult to obtain for a large number of sources. Here we highlight two methods that work for any $F$ but can only give looser bounds.

The first method is the culmination of Ref. (Tsang, 2017, Sec. 6) and Ref. (Tsang, 2019b, Appendix C). Assume that

[TABLE]

consists of two sets of parameters $\theta_{g}=(\theta_{g1},\theta_{g2},\dots)^{\top}$ and $\theta_{h}=(\theta_{h0},\theta_{h1},\dots)^{\top}$ . Define a submodel given by

[TABLE]

The truth is at

[TABLE]

$\sigma(\theta)$ can be rewritten as

[TABLE]

In other words, we have introduced parameters to both the mixing density and the displacement in the model by rewriting the mixture. Appendix H shows how the extended convexity of the Helstrom information Alipour and Rezakhani (2015); Ng et al. (2016) can be used on Eq. (130) to give

[TABLE]

A more careful calculation shows that the SPADE error is exactly equal to this bound for $\mu=2$ Tsang (2019c). For higher moments, however, Eq. (131) remains much lower than that achievable by SPADE.

The second method, as reported in Ref. Tsang (2019b), considers the formal expansion $\exp(-ikX)=\sum_{p=0}^{\infty}(-ikX)^{p}/p!$ , which leads to

[TABLE]

Consider this as a parametric submodel with only one scalar parameter $\theta=\beta_{\mu}$ for a given $\mu$ , while all the other moments $\beta_{\nu}$ with $\nu\neq\mu$ are fixed. Then the Helstrom bound for this submodel is simply $\mathsf{H}_{\mu}^{\sigma}=1/K^{\sigma}_{\mu\mu}$ , where $K^{\sigma}_{\mu\mu}$ is the Helstrom information with respect to $\theta=\beta_{\mu}$ . Reference Tsang (2019b) finds via a purification technique that this Helstrom bound is in turn bounded by

[TABLE]

By virtue of Corollary 3 and Proposition 1, we obtain

[TABLE]

This lower bound does match the performance of SPADE in order of magnitude, but it does not have a simple closed-form expression, and the question of whether SPADE is exactly efficient for moments higher than the second remains open.

VII Semiparametric estimation with explicit

nuisance parameters

VII.1 The efficient score operator

We now consider problems where there is an explicit partition of the parameters into a scalar $\beta$ and nuisance parameters $\eta$ that may be infinite-dimensional, viz.,

[TABLE]

An example is the displacement model given by Eq. (6), where $\beta$ is the displacement parameter and the initial state $\rho_{0}$ depends on the nuisance parameters. All previous studies of the problem assume that $\rho_{0}$ is known exactly. In practice, however, $\rho_{0}$ may be poorly characterized, and the estimation performance in the presence of unknown nuisance parameters may suffer as a result.

With the explicit partition of the parameters, the scores can be partitioned similarly. Let $S^{\beta}$ be the score with respect to the parameter of interest, as defined by

[TABLE]

where $\eta$ is fixed at the truth. To define the nuisance scores, consider the subfamily

[TABLE]

which holds $\beta$ fixed at the truth instead. Define the nuisance tangent set $\{S^{\eta}\}$ as the set of scores from all parametric submodels of $\mathbf{G}_{\eta}$ and the nuisance tangent space as

[TABLE]

The unbiasedness condition for an influence operator becomes

[TABLE]

The second of Eqs. (139) implies that $\delta\perp\Lambda$ , so if $S^{\beta}\in\Lambda$ , $\langle S^{\beta},\delta\rangle=0$ , and no influence operator that obeys both Eqs. (139) can exist. In that case we assume the GHB to be infinite. Provided that $S^{\beta}\notin\Lambda$ , however, the following theorem provides another method of computing the efficient influence and the GHB.

Theorem 6.

Assuming $S^{\beta}\notin\Lambda$ and the unbiasedness condition given by Eqs. (139), the efficient influence and the GHB are given by

[TABLE]

where $S_{\rm eff}$ , henceforth called the efficient score, is given by

[TABLE]

Proof.

Delegated to Appendix I. ∎

Figure 10 illustrates the Hilbert-space concepts involved in Theorem 6. We note that Ref. (Suzuki et al., 2019, Sec. 5) has also arrived at conclusions similar to Theorem 6 in the parametric case, but the crucial point here is the Hilbert-space approach, which will enable us to derive closed-form solutions to semiparametric problems, as shown in the next section.

VII.2 Displacement estimation with a constrained

family of initial states

Consider the displacement model given by Eq. (6) and illustrated by Fig. 11. For high-dimensional systems, only a few moments of the initial state $\rho_{0}$ may be known in practice, and it is prudent to assume that $\rho_{0}$ is in the constrained family $\mathbf{G}_{\gamma}$ defined by Eq. (77). The density-operator family for the problem can be expressed as

[TABLE]

where the unitary map $\mathcal{U}_{\beta}$ is defined as

[TABLE]

Generalization for more complicated generators is possible Tsang et al. (2011) but outside the scope of this paper.

Define an inner product and a norm with respect to the true $\rho_{0}$ as

[TABLE]

Define also the operator Hilbert space $\mathcal{Z}_{0}$ with respect to $\rho_{0}$ , the tangent space $\mathcal{T}_{0}$ at $\rho_{0}$ with respect to $\mathbf{G}_{\gamma}$ , and the orthocomplement $\mathcal{T}_{0}^{\perp}$ that gives $\mathcal{Z}_{0}=\mathcal{T}_{0}\oplus\mathcal{T}_{0}^{\perp}$ , in the same way as how the spaces $\mathcal{Z}$ , $\mathcal{T}$ , and $\mathcal{T}^{\perp}$ are defined with respect to $\rho$ . Noting the unitarity of $\mathcal{U}_{\beta}$ and following the method in Appendix C, it can be shown that the nuisance tangent space is given by

[TABLE]

Define the map adjoint to $\mathcal{U}_{\beta}$ by $\mathcal{U}_{\beta}^{*}h\equiv\exp(iH\beta)h\exp(-iH\beta)$ . Exploiting the isomorphism between $\Lambda$ and $\mathcal{T}_{0}$ , we can compute the efficient score as follows:

[TABLE]

where $R$ is the vector of antiscores with respect to $\rho_{0}$ , as defined by Eq. (80) but with $\rho_{0}$ and $\langle\cdot,\cdot\rangle_{0}$ instead. Equation (149) can be further simplified, with

[TABLE]

where $[A,B]_{jk}\equiv A_{j}B_{k}-B_{k}A_{j}$ and $[\cdot,\cdot]_{0}$ is shorthand for $-i\operatorname{tr}\rho_{0}[\cdot,\cdot]$ . Equation (151) comes from the fact that $S^{\beta}=\mathfrak{D}H$ for the model given by Eq. (143), where $\mathfrak{D}$ is the so-called commutation superoperator defined by Holevo (2011, 1977)

[TABLE]

The final result is

[TABLE]

In particular, if the constraint is linear and a scalar given by

[TABLE]

then

[TABLE]

which gives Eq. (7). $\lVert Z\rVert_{0}^{2}$ is the variance of $Z$ , while

[TABLE]

is a measure of how sensitive the Heisenberg-picture $Z$ is to the displacement. An intuitive explanation of this result is as follows. A displacement can be estimated only with respect to a known reference. If only the mean of $Z$ is known about the initial state, then it is the only reference in the quantum object that is available to the observer. It is therefore not surprising—in hindsight—that the statistics of $Z$ determine the fundamental limit.

If $H$ is the momentum operator and $Z$ is the position operator satisfying $[Z,H]=i$ , the Heisenberg picture of $Z$ is

[TABLE]

which is a quantum additive-noise model with no known statistics about the noise operator $Z$ other than its mean. Measurements of $Z$ and the sample mean of the outcomes are efficient. This problem then becomes equivalent to the $\beta=\operatorname{tr}\rho Y$ example, but note that Eqs. (155) and (156) are more general, as they can deal with any generator, a $\beta$ that cannot be easily expressed as a functional of $\rho$ , and more general constraints.

Another example is optical phase estimation with

[TABLE]

and constraint $\operatorname{tr}\rho_{0}Z=\zeta$ on the mean of the quadrature operators $Z=(Z_{1},Z_{2})^{\top}$ with $[Z_{1},Z_{2}]=i$ . There is no phase observable Mandel and Wolf (1995), so expressing $\beta$ as a functional of $\rho$ is difficult if not impossible. Equations (155) and (156), on the other hand, are simple expressions in terms of the generator and the antiscores. In Eqs. (154)–(156), $R=Z-\zeta$ ,

[TABLE]

is simply the covariance matrix of the quadratures, while

[TABLE]

are the mean quadrature values. The efficient influence $\delta_{\rm eff}\propto S_{\rm eff}$ is a linear combination of the quadratures according to Eq. (154), indicating the ideal, though parameter-dependent, quadrature to be measured. An adaptive measurement can then aim to measure the ideal quadrature to approach the quantum limit.

When $\rho_{0}$ is exactly known, the Helstrom bound for displacement estimation has been computed exactly only if $\rho_{0}$ is pure or Gaussian. Only looser bounds have been found otherwise Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015). The Mandelstam-Tamm inequality, for example, is looser than the Helstrom bound for mixed states Holevo (2011). $S^{\beta}$ is determined by $\mathfrak{D}H$ , and if $\rho_{0}$ is a high-dimensional non-Gaussian mixed state, $S^{\beta}$ is intractable. With the infinitely many nuisance parameters and infinitely many scores assumed here, the problem is hopeless under the conventional bottom-up approach. The top-down geometric approach, on the other hand, is able to avoid the computation of the scores altogether and give a simple result in terms of the more tractable antiscores.

VIII Vectoral parameter of interest

To complete the formalism, here we generalize the core results in this paper for a vectoral parameter of interest $\beta\in\mathbb{R}^{q}$ with $q\geq 1$ entries. $p$ , the dimension of the parameter space, should be at least as large as $q$ and may be infinite. Define the error matrix as

[TABLE]

where $\check{\beta}:\mathcal{X}\to\mathbb{R}^{q}$ is an estimator. An influence operator should then be a vector of $q$ operators. The inner product between two vectoral operators and the norm are now defined as

[TABLE]

The Hilbert spaces $\mathcal{Y}$ and $\mathcal{Z}$ for the vectoral operators are still expressed as Eqs. (20) and (21), while the tangent space is now defined as the replicating space Tsiatis (2006)

[TABLE]

The set of influence operators is still given by Eq. (28) if $\langle S,\delta\rangle=\partial\beta$ is interpreted as $\langle S^{\sigma},\delta_{k}\rangle=\partial\beta_{k}$ for all submodels and $k=1,\dots,q$ . For an unbiased measurement, the error operator given by Eq. (29) remains an element of $\mathcal{D}$ , and it can be shown (Holevo, 2011, Sec. 6.2) that

[TABLE]

where the matrix inequality $A\geq B$ means that $A-B$ is positive-semidefinite. The GHB can then be expressed as

[TABLE]

where $W\geq 0$ is a real cost matrix Hayashi (2017). Generalizing Theorems 1 and 3, we have

Theorem 7.

The GHB for a vectoral parameter of interest is given by

[TABLE]

where the efficient influence $\delta_{\rm eff}$ is the unique element in $\mathcal{D}$ given by

[TABLE]

Proof.

Delegated to Appendix J. ∎

It is straightforward to generalize the methods introduced in this paper to compute the GHB for the vectoral case.

Holevo proposed another bound, denoted in the following by the sans-serif $\mathsf{X}$ , that can account for the quantum effect of observable incompatibility in multiparameter estimation Holevo (2011); Nagaoka (1989). Before we prove the bound and related results, we need the following lemma.

Lemma 1 (Belavkin and Grishanin Belavkin and Grishanin (1973)).

For any complex positive-semidefinite matrix $A$ ,

[TABLE]

where $\operatorname{Re}A$ and $\operatorname{Im}A$ denote the entry-wise real and imaginary parts of $A$ , respectively, and $\lVert\cdot\rVert_{1}$ denotes the trace norm, defined as the sum of the singular values.

Proof.

Provided in Appendix K for completeness. ∎

We can now present the Holevo bound. It requires little modification to be applied to semiparametric estimation; only the definition of $\mathcal{D}$ needs to be generalized to Eq. (28) here. Otherwise the proof is standard Holevo (2011); Nagaoka (1989); Demkowicz-Dobrzanski et al. (2020); we provide it here simply to demonstrate that it remains valid in the semiparametric setting.

Theorem 8.

[TABLE]

where $\Gamma(\delta)$ is a complex matrix given by

[TABLE]

Proof.

Holevo proved (Holevo, 2011, Eq. (6.6.55)) that the error matrix and the error operator of any unbiased measurement obeys

[TABLE]

Thus $A=\sqrt{W}(\Sigma-\Gamma)\sqrt{W}\geq 0$ . Applying Lemma 1 and noting that $\Sigma$ is real, we obtain

[TABLE]

Hence

[TABLE]

∎

The asymptotic attainability of the Holevo bound for $d<\infty$ has been shown in Refs. Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020). The rough idea there is to consider a two-step method: first find an estimate $\check{\theta}$ of $\theta$ using some of the object copies, and then perform a measurement based on the influence operators obtained from the minimization in Eq. (173), assuming $\check{\theta}$ to be the truth. In the limit of $N\to\infty$ , the overhead for finding $\check{\theta}$ is benign, and it can be shown that the error approaches $\mathsf{X}$ by local asymptotic normality.

For all the examples studied in previous sections, $\beta$ was a scalar, and it is straightforward to prove that the Holevo bound is equal to the GHB in that case.

Corollary 4.

If $\beta$ is a scalar ( $q=1$ ),

[TABLE]

Proof.

For $q=1$ , $\Gamma(\delta)=\operatorname{tr}\rho\delta^{2}$ and $\operatorname{Im}\Gamma(\delta)=0$ , leading to

[TABLE]

∎

The scalar GHB hence inherits all the properties of the Holevo bound, including its asymptotic attainability. In fact, for any $q$ , the Holevo bound turns out to be a marginal improvement over the GHB only.

Theorem 9.

[TABLE]

Proof.

For all $\delta\in\mathcal{D}$ ,

[TABLE]

As $\mathsf{X}$ is the infimum of Eq. (181), we obtain $\mathsf{X}\geq\tilde{\mathsf{H}}$ , the first inequality of the theorem. The second inequality is proved as follows:

[TABLE]

where Eq. (184) is obtained by applying Lemma 1 to $A=\sqrt{W}\Gamma(\delta_{\rm eff})\sqrt{W}$ . ∎

The first inequality $\tilde{\mathsf{H}}\leq\mathsf{X}$ is well known Holevo (2011); Nagaoka (1989); Ragy et al. (2016). A special case $\mathsf{X}\leq 2\mathsf{H}$ of the second inequality—when $p<\infty$ , $K^{-1}$ exists, and $\tilde{\mathsf{H}}=\mathsf{H}$ is the original Helstrom bound—was proved recently in Ref. Carollo et al. (2019); *carollo20. $\mathsf{X}=2\mathsf{H}$ can be attained in special cases Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020).

Theorem 9 implies that the effect of incompatibility is surprisingly benign in the context of asymptotic statistics, the GHB can be approached to within a factor of two if the Holevo bound is attainable, and the GHB is a serviceable alternative to the Holevo bound, especially when the latter is more difficult to compute. See Ref. Demkowicz-Dobrzanski et al. (2020) for further interesting discussions regarding this result.

As an aside, we remark that the $\mathsf{D}$ in Eq. (183) is called the $\mathfrak{D}$ -invariant bound and coincides with $\mathsf{X}$ if $\mathcal{T}=\mathfrak{D}\mathcal{T}$ , where $\mathfrak{D}$ is given by Eq. (153) Holevo (2011); Suzuki (2016); *suzuki19a. In general, $\mathsf{D}$ offers a tighter upper bound on $\mathsf{X}$ than $2\tilde{\mathsf{H}}$ but may not be much more difficult to compute, as it also depends on $\delta_{\rm eff}$ , which can be found via the methods introduced in this work.

We present a few other interesting results concerning multiparameter estimation with $p<\infty$ in Appendix L.

Finally, we generalize the concept of efficient score in Theorem 6 for a vectoral $\beta$ .

Theorem 10.

Assume a density-operator family given by

[TABLE]

Let $S^{\beta}=(S^{\beta}_{1},\dots,S^{\beta}_{q})^{\top}$ be the scores with respect to $\beta$ and $\{S^{\eta}\}$ be the nuisance tangent set. Assume the unbiasedness condition for influence operators $\delta\in\mathcal{D}$ given by

[TABLE]

where $I$ is the identity matrix. The efficient influence and the GHB are given by

[TABLE]

where the efficient score $S_{\rm eff}$ is given by

[TABLE]

and $\langle S_{\rm eff},S_{\rm eff}\rangle^{-1}$ is assumed to exist.

Proof.

Almost identical to that of Theorem 6 in Appendix I and omitted here for brevity. ∎

IX Conclusion

We have founded a theory of quantum semiparametric estimation and showcased its power by producing simple quantum bounds for a large class of problems with high dimensions and few assumptions about the density operator. The theory establishes the notion of quantum semiparametric efficiency, which should inform and inspire the design of more efficient measurements in many areas of quantum physics.

While the experimental design of efficient semiparametric measurements is only touched upon here and awaits further research, the importance of the quantum limits set forth should not be underestimated. As more experiments are now being performed on complex quantum systems and advantages of such systems for metrology and information processing in general are being claimed, the precision limits serve as ultimate yardsticks as well as “no-go” theorems that guard against spurious proposals and fruitless endeavors, in the same way the laws of thermodynamics impose limits to engines and rule out perpetual-motion machines. Deriving precision limits for highly complex or poorly modeled quantum systems was a daunting task under the curse of dimensionality; the semiparametric theory offers a new way forward.

Many open problems still remain. More extensions and applications of the theory remain to be worked out. The asymptotic attainability of efficiency Hayashi (2017, 2005); Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020) is a thorny issue for infinite-dimensional problems. The assumption of unbiased estimation is a drawback; generalizations to the Bayesian or minimax paradigm Van Trees and Bell (2007); *schutzenberger57; *vantrees; *gill95; *personick71; *hayashi11; *liu16; *chabuda16; *rubio19; *rubio20 should help but await further research. These problems should benefit from studies of alternative quantum bounds beyond the Cramér-Rao type Tsuda and Matsumoto (2005); *glm2012; *qzzb; *qbzzb; *qwwb; *hall_prx; *nair18. In view of Eq. (59) and Figs. 3 and 4, the connections of quantum semiparametrics to other domains of quantum information Tomamichel and Hayashi (2013); *li14a; *tomamichel16 and quantum state geometry Hayashi (2017, 2005); Amari and Nagaoka (2000) are also interesting future directions.

In light of the richness and wide applications of the classical semiparametric theory Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006); Newey (1990); Feigelson and Babu (2012); Tsang (2019c), this work has only scratched the surface of the full potential of quantum semiparametrics. It should open doors to further useful results.

Acknowledgements.

We thank M. G. Genoni both for several fruitful discussions and for making us aware of Refs. Carollo et al. (2019); *carollo20. We are grateful to R. Nair, M. Guţă, R. Gill, D. Branford, R. Demkowicz-Dobrańzski, J. F. Friel, W. Górecki, and J. Suzuki for useful discussions. This research is partly supported by the National Research Foundation (NRF) Singapore, under its Quantum Engineering Programme (Award QEP-P7). AD and FA have been supported by the UK EPSRC (EP/K04057X/2) and the UK National Quantum Technologies Programme (EP/M01326X/1, EP/M013243/1). FA also acknowledges financial support from the National Science Center (Poland) grant No. 2016/22/E/ST2/00559.

Appendix A Proof of Corollary 1

If $p<\infty$ and $K^{-1}$ exists, the solution to $\Pi(\delta|\mathcal{T})$ can be found, for example, in Ref. (Bickel et al., 1993, Eq. (15) in Appendix A.2). Here we give a simple proof for completeness. By definition of the projection Debnath and Mikusiński (2005),

[TABLE]

Any $h\in\mathcal{T}$ can be expressed as the linear combination $w^{\top}S$ with respect to a certain vector $w\in\mathbb{R}^{p}$ . Then

[TABLE]

The solution to the least-squares problem is

[TABLE]

Hence

[TABLE]

which is equal to Eq. (15), since $\langle S,\delta\rangle=\partial\beta$ for an influence operator.

Appendix B Proof of Corollary 2

Denote any concept discussed so far with the superscript $(N)$ if it is associated with $\mathbf{F}^{(N)}$ , but omit the superscript $(1)$ for brevity if $N=1$ . From $\mathcal{Z}$ , we generate a subspace $U\mathcal{Z}\subset\mathcal{Z}^{(N)}$ such that

[TABLE]

$U$ is a surjective map to $U\mathcal{Z}$ by definition of the space. It can be shown that

[TABLE]

so $U\mathcal{Z}$ is isomorphic to $\mathcal{Z}$ , and $U$ is a unitary map from $\mathcal{Z}$ to $U\mathcal{Z}$ Reed and Simon (1980). It can also be shown that

[TABLE]

so $\mathcal{T}^{(N)}=\operatorname{\overline{span}}\{S^{(N)}\}\subseteq U\mathcal{Z}$ , and $\mathcal{T}^{(N)}$ is isomorphic to $\mathcal{T}$ . For any $Uh\in U\mathcal{Z}$ , it is not difficult to prove that

[TABLE]

given the isomorphisms. Now let

[TABLE]

where $\delta$ is an influence operator. $\delta^{(N)}$ is also an influence operator, since

[TABLE]

The efficient influence for $\mathbf{F}^{(N)}$ becomes

[TABLE]

the norm becomes

[TABLE]

and the corollary ensues.

Appendix C Proof of Corollary 3

Let $\{S^{(N)}\}$ be the tangent set for $\mathbf{G}^{(N)}$ . For each parametric submodel $\{\sigma(\theta)\}$ of $\mathbf{G}$ , let

[TABLE]

be a parametric submodel of $\mathbf{G}^{(N)}$ . The score of the submodel is given by

[TABLE]

In other words, each $S^{\sigma}\in\{S\}$ can be used to generate a score in $\{S^{(N)}\}$ via Eq. (204). The set of scores generated this way is therefore a subset of $\{S^{(N)}\}$ , viz.,

[TABLE]

Conversely, any parametric submodel of $\mathbf{G}^{(N)}$ must be in the form of Eq. (203), with $\{\sigma(\theta)\}$ being a certain parametric submodel of $\mathbf{G}$ . The score of the former is then related to the score of the latter via Eq. (204). Since $\{S\}$ includes the scores of all parametric submodels of $\mathbf{G}$ , any $S^{\tau}\in\{S^{(N)}\}$ must be in $\{\sqrt{N}US\}$ . Thus $\{S^{(N)}\}\subseteq\{\sqrt{N}US\}$ , and equality holds, viz.,

[TABLE]

It follows that

[TABLE]

is isomorphic to $\mathcal{T}=\operatorname{\overline{span}}\{S\}$ . Hence, projecting an influence operator of the form $\delta^{(N)}=U\delta/\sqrt{N}$ into $\mathcal{T}^{(N)}$ gives the efficient influence $\delta_{\rm eff}^{(N)}=U\delta_{\rm eff}/\sqrt{N}$ , by the same argument as Appendix B.

Appendix D The set of bounded operators

is dense in $\mathcal{Z}$

To generalize Theorem 2 for the infinite-dimensional case and prove Theorem 4, we need to be mindful of the unbounded operators in $\mathcal{Z}$ . The good news is that they are well defined as limits of bounded-operator sequences in $\mathcal{Y}$ , thanks to Holevo Holevo (2011, 1977); just a minor modification is needed to make his result work for $\mathcal{Z}$ .

Consider the set $\mathcal{B}$ of bounded elements defined by Eq. (72). If $d<\infty$ , $\mathcal{B}=\overline{\mathcal{B}}=\mathcal{Z}$ , since all operators are bounded in the finite-dimensional case, but if $d=\infty$ , $\mathcal{B}\subset\mathcal{Z}$ is a strict subset. A useful lemma is as follows.

Lemma 2.

$\overline{\mathcal{B}}=\mathcal{Z}$ .

Proof.

Reference (Holevo, 2011, Theorem 2.8.1) implies that, for any $h\in\mathcal{Z}\subset\mathcal{Y}$ , there exists a Cauchy sequence $\{h_{n}\}$ with each $h_{n}\in\mathcal{Y}$ satisfying $\lVert h_{n}\rVert_{\rm op}<\infty$ such that

[TABLE]

To derive a similar convergent sequence in $\mathcal{Z}$ , consider the projection of each $h_{n}$ into $\mathcal{Z}$ , written as

[TABLE]

Denote a bounded operator in the equivalence class of $h_{n}$ as $\hat{h}_{n}$ . An operator for $h_{n}^{\prime}$ can be expressed as

[TABLE]

Since $\lVert\hat{h}_{n}\rVert_{\rm op}<\infty$ and $\lVert\langle h_{n},I\rangle\hat{I}\rVert_{\rm op}=|\langle h_{n},I\rangle|<\infty$ ,

[TABLE]

by the triangle inequality, leading to $h_{n}^{\prime}\in\mathcal{B}$ . The Pythagorean theorem leads to

[TABLE]

which can be combined with Eq. (208) to give

[TABLE]

In other words, $\{h_{n}^{\prime}\}$ , with each $h_{n}^{\prime}\in\mathcal{B}$ , is also Cauchy and converges to $h$ . As the argument applies to any $h\in\mathcal{Z}$ , $\mathcal{B}$ is dense in $\mathcal{Z}$ , and the closure of $\mathcal{B}$ gives $\mathcal{Z}$ . ∎

Appendix E Proof of Proposition 1

Let the orthocomplement of $\mathcal{V}$ in $\mathcal{T}$ be $\mathcal{V}_{\mathcal{T}}^{\perp}$ . Then the Pythogorean theorem yields

[TABLE]

where the last step uses Ref. (Bickel et al., 1993, Proposition 3B in Appendix A.2). $\lVert\Pi(\delta|\mathcal{V})\rVert^{2}=\lVert\delta\rVert^{2}-\lVert\Pi(\delta|\mathcal{V}^{\perp})\rVert^{2}$ follows again from the Pythagorean theorem for a $\delta\in\mathcal{Z}=\mathcal{V}\oplus\mathcal{V}^{\perp}$ . Equation (99) comes from Theorem 1.

Appendix F Proof of Proposition 2

Let $P$ be the true density. For real functions on $\mathbb{C}$ , define an inner product and a norm with respect to $P$ as

[TABLE]

Define the Hilbert space of zero-mean functions as

[TABLE]

For each $f\in\mathcal{Z}_{P}$ , construct the parametric submodel

[TABLE]

with the truth at $\sigma(0)=\rho$ and $P(\alpha|0)=P(\alpha)$ . $f(\alpha)$ is the score function with respect to $P(\alpha|\theta)$ . The score with respect to $\sigma$ is then given by

[TABLE]

where the map $\mathcal{E}:\mathcal{Z}_{P}\to\mathcal{Z}$ is a quantum version of the conditional expectation Hayashi (2017). Hence

[TABLE]

Consider the inner product between $\mathcal{E}f$ and an $h\in\mathcal{B}\subset\mathcal{Z}$ given by

[TABLE]

where Eq. (25) is used and $\mathcal{E}^{*}$ is the adjoint map given by the Husimi representation

[TABLE]

Since $h\in\mathcal{Z}$ ,

[TABLE]

and $\mathcal{E}^{*}h\in\mathcal{Z}_{P}$ . The map $\mathcal{E}^{*}:\mathcal{B}\to\mathcal{Z}_{P}$ is obviously linear. It is also bounded because

[TABLE]

Thus $\mathcal{E}^{*}$ is a continuous linear map (Debnath and Mikusiński, 2005, Theorem 1.5.7). As $\mathcal{B}$ is a dense subset of $\mathcal{Z}$ by virtue of Lemma 2, $\mathcal{E}^{*}$ can be uniquely extended to a continuous linear map on the whole $\mathcal{Z}$ (Debnath and Mikusiński, 2005, Theorem 1.5.10).

Any $h\in\mathcal{T}^{\perp}$ must obey

[TABLE]

The only solution is $\mathcal{E}^{*}h=0$ . In other words, $\mathcal{T}^{\perp}$ is in the null space of $\mathcal{E}^{*}$ . As the Husimi representation is injective Jordan (1964); *mehta65, the only solution to $\mathcal{E}^{*}h=0$ is $h=0$ . Hence $\mathcal{T}^{\perp}=\{0\}$ , and $\mathcal{T}=\mathcal{Z}$ .

Appendix G $\mathcal{T}^{\perp}$ for diffraction-limited

incoherent imaging is infinite-dimensional

Following Appendix F, it can be shown that $h\in\mathcal{T}^{\perp}$ if

[TABLE]

for the incoherent-imaging problem in Sec. VI.3. Consider, for example, $h=\int dk\tilde{h}(k)\ket{k}\bra{k}$ , where $\ket{k}$ is a momentum eigenket. Then Eq. (230) is satisfied if

[TABLE]

Let $\{\tilde{a}_{j}(k):j\in\mathbb{N}_{0}\}$ be the set of Hermite polynomials that are orthogonal with respect to the weight function $\exp(-2k^{2})$ . Then any $\tilde{a}_{j}(k)$ with $j>0$ satisfies Eq. (231). Define the set

[TABLE]

Each $a_{j}$ obeys Eq. (231) and

[TABLE]

so $\{a\}$ is an orthogonal set with respect to the inner product given by Eq. (17). As $\operatorname{\overline{span}}\{a\}\subseteq\mathcal{T}^{\perp}$ ,

[TABLE]

which means that the dimension of $\mathcal{T}^{\perp}$ must be infinite.

Appendix H Derivation of Eq. (131)

For the density-operator family given by Eq. (130), the extended convexity of the Helstrom information Alipour and Rezakhani (2015); Ng et al. (2016) implies that

[TABLE]

where $\langle\Delta k^{2}\rangle=\bra{\psi_{0}}k^{2}\ket{\psi_{0}}-(\bra{\psi_{0}}k\ket{\psi_{0}})^{2}=1/4$ is the variance of $k$ . With the explicit partition of $\theta$ into $\theta_{g}$ and $\theta_{h}$ , $\tilde{K}$ can be expressed as

[TABLE]

where $\partial_{g}=(\partial_{g1},\partial_{g2},\dots)^{\top}$ and $\partial_{h}=(\partial_{h0},\partial_{h1},\dots)^{\top}$ . Let

[TABLE]

where $\{a_{j}(X):j\in\mathbb{N}_{0}\}$ is a set of orthogonal polynomials with respect to the true $F$ that satisfy $\int dXF(X)a_{j}(X)a_{k}(X)=\delta_{jk}$ . $a_{0}(X)=1$ is omitted from $g(X|\theta_{g})$ because $g$ is a score function with respect to $F$ and $\int dXF(X)g(X|\theta_{g})=0$ implies that $g(X|\theta_{g})$ cannot contain $a_{0}(X)$ in its expansion. The orthonormality of $\{a\}$ leads to

[TABLE]

Now consider

[TABLE]

Then

[TABLE]

where the completeness property

[TABLE]

is assumed. With

[TABLE]

and using Corollary 3 and Proposition 1, Eq. (131) is obtained.

Appendix I Proof of Theorem 6

The proof follows the classical case Tsiatis (2006). As $S^{\beta}\notin\Lambda$ , the $S_{\rm eff}$ given by Eq. (141) is not zero. Let

[TABLE]

Notice that Eq. (141) is a projection of $S^{\beta}$ into a space orthogonal to $\Lambda$ , so $S_{\rm eff}\perp\Lambda$ and $\delta\perp\Lambda$ . Then

[TABLE]

because $\Pi(S^{\beta}|\Lambda)\in\Lambda$ and each $S_{j}^{\eta}\in\Lambda$ . Thus $\delta$ satisfies Eqs. (139) and is an influence operator. Notice also that $S_{\rm eff}$ and $\delta$ are in $\mathcal{T}$ , because $S^{\beta}\in\mathcal{T}$ and $\Pi(S^{\beta}|\Lambda)\in\Lambda\subseteq\mathcal{T}$ . Hence, by Theorem 3,

[TABLE]

and Eq. (251) is the efficient influence.

Appendix J Proof of Theorem 7

We again follow Ref. Tsiatis (2006). Decompose any $\delta\in\mathcal{D}\subseteq\mathcal{Z}=\mathcal{T}\oplus\mathcal{T}^{\perp}$ into

[TABLE]

It is straightforward to prove that $\delta_{\rm eff}\in\mathcal{D}$ . As $h$ is orthogonal to any element in $\mathcal{T}\equiv(\operatorname{\overline{span}}\{S\})^{\oplus q}$ , it must be orthogonal to $g=(0,\dots,0,e,0,\dots,0)^{\top}$ with any $e\in\operatorname{\overline{span}}\{S\}$ in any entry of $g$ , say, the $j$ th entry. Then

[TABLE]

meaning that each entry of $h$ is orthogonal to $\operatorname{\overline{span}}\{S\}$ . This leads to a stronger matrix form of the orthogonality between $\delta_{\rm eff}\in\mathcal{T}$ and $h\in\mathcal{T}^{\perp}$ given by

[TABLE]

and a matrix form of the Pythagorean theorem given by

[TABLE]

resulting in Eq. (170). To prove the uniqueness of $\delta_{\rm eff}$ in $\mathcal{D}$ , suppose that there exists another $\delta^{\prime}\in\mathcal{D}$ that gives $\langle\delta^{\prime},\delta^{\prime}\rangle=\langle\delta_{\rm eff},\delta_{\rm eff}\rangle$ . Define $g=\delta^{\prime}-\delta_{\rm eff}$ . As $\langle S,g\rangle=\langle S,\delta^{\prime}\rangle-\langle S,\delta_{\rm eff}\rangle=\partial\beta-\partial\beta=0$ , $g\in\mathcal{T}^{\perp}$ , and the matrix Pythagorean theorem gives $\langle\delta^{\prime},\delta^{\prime}\rangle=\langle\delta_{\rm eff},\delta_{\rm eff}\rangle+\langle g,g\rangle$ . This implies that $\langle g,g\rangle=0$ , $\lVert g\rVert^{2}=\operatorname{tr}\langle g,g\rangle=0$ , and $g=0$ , contradicting the assumption that $\delta^{\prime}\neq\delta_{\rm eff}$ . Hence $\delta_{\rm eff}$ must be unique.

Appendix K Proof of Lemma 1

Let the superscript $*$ denote the entry-wise conjugation of a matrix and the superscript $\dagger=*\top$ denote the conjugate transpose. $A\geq 0$ means that $z^{\dagger}Az\geq 0$ for any $z\in\mathbb{C}^{q}$ . We also have $A^{*}\geq 0$ , since $z^{\dagger}A^{*}z=(z^{*\dagger}Az^{*})^{*}=z^{*\dagger}Az^{*}\geq 0$ for any $z\in\mathbb{C}^{q}$ . Thus, for any $z\in\mathbb{C}^{q}$ ,

[TABLE]

Let $\{\lambda_{s},z_{s}:s=1,\dots,q\}$ be the eigenvalues and eigenvectors of the Hermitian $i\operatorname{Im}A$ . As the singular values of $i\operatorname{Im}A$ are $\{|\lambda_{s}|\}$ , we obtain

[TABLE]

Appendix L Some results

concerning quantum multiparameter estimation

This appendix presents some interesting results concerning quantum multiparameter estimation, following Sec. VIII and assuming $1\leq q\leq p<\infty$ .

A crucial assumption in this paper is that $\mathcal{D}$ , the set of influence operators, is not empty. While this is not a problem for all the examples studied in this paper, the following theorem, generalizing a classical result by Stoica and Marzetta Stoica and Marzetta (2001), can be used to verify the assumption.

Theorem 11.

$\mathcal{D}$ * is not empty if and only if all the columns of $\partial\beta$ are in the range of the Helstrom information matrix $K$ , viz.,*

[TABLE]

where the superscript $+$ denotes the Moore-Penrose pseudoinverse Golub and Van Loan (2013).

Proof.

We prove the “only if” part first. Assume that a $\delta\in\mathcal{D}$ exists. It satisfies $\langle S,\delta\rangle=\partial\beta$ , and therefore

[TABLE]

for any $u\in\mathbb{R}^{p}$ and $v\in\mathbb{R}^{q}$ . The Cauchy-Schwartz inequality gives

[TABLE]

Now suppose that $u$ is in the null space of $K$ , such that $Ku=0$ , and pick $v=(\partial\beta)^{\top}u$ . We obtain

[TABLE]

which implies $(\partial\beta)^{\top}u=0$ . As this must hold for any $u$ in the null space of $K$ , each column of $\partial\beta$ must be orthogonal to the null space and therefore in the range of $K$ . $KK^{+}$ is the projection matrix into the range of $K$ Golub and Van Loan (2013), so Eq. (261) holds.

The “if” part comes from the fact that, as long as Eq. (261) holds,

[TABLE]

satisfies $\langle\delta,I\rangle=0$ and $\langle S,\delta\rangle=KK^{+}\partial\beta=\partial\beta$ and is therefore an influence operator. ∎

For an illustrative example, consider

[TABLE]

with the geometry depicted in Figure 12. $S_{1}=0$ and $K_{11}=\langle S_{1},S_{1}\rangle=0$ at the singular point $\theta=\varphi$ , meaning that

[TABLE]

The tangent space there becomes a line in the $S_{2}$ direction, and it is impossible for a $\delta$ to satisfy

[TABLE]

if $a\neq 0$ .

If Eq. (261) does not hold at certain values of $\theta$ , Theorem 11 implies that an unbiased estimator of $\beta$ cannot exist there, and the GHB can be assumed to be infinite. Note, however, that a biased estimator may still be able to achieve a finite error.

Provided that Eq. (261) holds, a pseudoinverse form of the Helstrom bound can be obtained.

Corollary 5.

If Eq. (261) holds,

[TABLE]

Proof.

Equation (265) is an influence operator and also a linear combination of $S$ , so it is in the tangent space $\mathcal{T}$ . By Theorem 1, it must be efficient. The other results follow from the fact $K^{+}KK^{+}=K^{+}$ Golub and Van Loan (2013) and the definition of $\tilde{\mathsf{H}}$ . ∎

The original Helstrom bound is a simple consequence, generalizing the scalar version in Corollary 1.

Corollary 6.

If $K>0$ ,

[TABLE]

Proof.

If $K>0$ , $K^{-1}$ exists, $K^{+}=K^{-1}$ , Eq. (261) always holds, and the results follow from Corollary 5. ∎

Finally, we mention that the semidefinite program presented in Ref. Albarelli et al. (2019) to evaluate the Holevo bound for $\beta=\theta$ and a nonsingular $K$ can be straightforwardly extended to the more general setup considered in this appendix.

Appendix M Post-publication notes

After the completion of this work and its acceptance for publication, Masahito Hayashi informed us that Ref. (Yang et al., 2019, (c)) and Refs. Suzuki et al. (2019); Suzuki (2019b); Suzuki et al. (2020) also study quantum estimation theory with nuisance parameters. In particular, Ref. (Suzuki et al., 2020, Sec. 4.3) independently arrives at results similar to our Theorem 8 and Corollary 6. Reference Suzuki et al. (2020) focuses on the parametric case ( $p<\infty$ ), whereas our Theorems 7 and 8 are proven to work in both parametric and semiparametric settings.

We note that our Theorems 7, 8, 9, 11 and Corollaries 4–6 first appear in an arXiv preprint of ours on February 5th, 2020 (Albarelli et al., 2020b, v2). We then decided to merge our two preprints Tsang et al. (2020); Albarelli et al. (2020b) into one manuscript (Tsang et al., 2020, v6), which was accepted by PRX on June 1st, 2020. On the other hand, the first appearance of Sec. 4.3 in Ref. Suzuki et al. (2020) seems to be in the Accepted Manuscript on the JPA website on April 21st, 2020—the section is absent in v1 and v2 of their arXiv preprint Suzuki et al. (2019).

On another note, Ref. (Suzuki et al., 2019, Remark 4.5 in v3) proves that, if $\dim\mathcal{T}^{\perp}<\infty$ and $W>0$ , then there exists a minimizing solution in $\mathcal{D}$ for the Holevo bound given by Eq. (173).

Bibliography134

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Helstrom (1976) Carl W. Helstrom, Quantum Detection and Estimation Theory (Academic Press, New York, 1976).
2Demkowicz-Dobrzański et al. (2015) Rafał Demkowicz-Dobrzański, Marcin Jarzyna, and Jan Kołodyński, “Quantum Limits in Optical Interferometry,” in Progress in Optics , Vol. 60, edited by E. Wolf (Elsevier, Amsterdam, 2015) Chap. 4, pp. 345–435. · doi ↗
3Paris (2009) Matteo G. A. Paris, “Quantum estimation for quantum technology,” International Journal of Quantum Information 07 , 125–137 (2009) . · doi ↗
4Giovannetti et al. (2011) Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone, “Advances in quantum metrology,” Nature Photonics 5 , 222–229 (2011) . · doi ↗
5Szczykulska et al. (2016) Magdalena Szczykulska, Tillmann Baumgratz, and Animesh Datta, “Multi-parameter quantum metrology,” Advances in Physics: X 1 , 621–639 (2016) . · doi ↗
6Pirandola et al. (2018) S. Pirandola, B. R. Bardhan, T. Gehring, C. Weedbrook, and S. Lloyd, “Advances in photonic quantum sensing,” Nature Photonics 12 , 724 (2018) . · doi ↗
7Braun et al. (2018) Daniel Braun, Gerardo Adesso, Fabio Benatti, Roberto Floreanini, Ugo Marzolino, Morgan W. Mitchell, and Stefano Pirandola, “Quantum-enhanced measurements without entanglement,” Reviews of Modern Physics 90 , 035006 (2018) . · doi ↗
8Pezzé et al. (2018) Luca Pezzé, Augusto Smerzi, Markus K. Oberthaler, Roman Schmied, and Philipp Treutlein, “Quantum metrology with nonclassical states of atomic ensembles,” Reviews of Modern Physics 90 , 035005 (2018) . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Quantum Semiparametric Estimation

Abstract

I Introduction

II Preview of typical results

III Geometric picture of quantum estimation

III.1 Helstrom bound

III.2 Hilbert spaces for operators

III.3 Generalized Helstrom bound

Theorem 1**.**

Proof.

Corollary 1**.**

Proof.

Corollary 2**.**

Proof.

III.4 Influence

III.5 Projection into the tangent space

Theorem 2**.**

Proof.

IV Parametric submodels

Theorem 3**.**

Proof.

Corollary 3**.**

Proof.

Theorem 4**.**

Proof.

V Constrained bounds

V.1 Antiscore operators

Theorem 5**.**

Proof.

V.2 Entropy estimation in quantum thermodynamics

V.3 Philosophy

V.4 Looser bounds

Proposition 1**.**

Proof.

VI Examples in optics

VI.1 Quadrature estimation

VI.2 Family of classical states

Proposition 2**.**

Proof.

VI.3 Incoherent imaging

VI.3.1 The mother model

VI.3.2 Semiparametric measurements and estimators

VI.3.3 Lower bounds via parametric submodels

VII Semiparametric estimation with explicit

VII.1 The efficient score operator

Theorem 6**.**

Proof.

VII.2 Displacement estimation with a constrained

VIII Vectoral parameter of interest

Theorem 7**.**

Proof.

Lemma 1** (Belavkin and Grishanin Belavkin and Grishanin (1973)).**

Proof.

Theorem 8**.**

Proof.

Corollary 4**.**

Proof.

Theorem 9**.**

Proof.

Theorem 10**.**

Proof.

IX Conclusion

Acknowledgements.

Appendix A Proof of Corollary 1

Appendix B Proof of Corollary 2

Appendix C Proof of Corollary 3

Appendix D The set of bounded operators

Lemma 2**.**

Proof.

Appendix E Proof of Proposition 1

Appendix F Proof of Proposition 2

Appendix G T⊥\mathcal{T}^{\perp}T⊥ for diffraction-limited

Appendix H Derivation of Eq. (131)

Theorem 1.

Corollary 1.

Corollary 2.

Theorem 2.

Theorem 3.

Corollary 3.

Theorem 4.

Theorem 5.

Proposition 1.

Proposition 2.

Theorem 6.

Theorem 7.

Lemma 1 (Belavkin and Grishanin Belavkin and Grishanin (1973)).

Theorem 8.

Corollary 4.

Theorem 9.

Theorem 10.

Lemma 2.

Appendix G $\mathcal{T}^{\perp}$ for diffraction-limited

Theorem 11.

Corollary 5.

Corollary 6.