Quantum Semiparametric Estimation
Mankei Tsang, Francesco Albarelli, Animesh Datta

TL;DR
This paper develops a quantum semiparametric estimation theory that provides simple bounds for high-dimensional quantum systems with limited prior assumptions, applicable to practical quantum measurement scenarios.
Contribution
It introduces a new framework for quantum semiparametric estimation that overcomes high dimensionality and limited prior knowledge, linking bounds to Holevo's quantum Cramér-Rao bound.
Findings
Provides analytic bounds for high-dimensional quantum estimation problems.
Relates bounds to Holevo's quantum Cramér-Rao bound for asymptotic attainability.
Applicable to quantum state properties like fidelity, purity, and entropy.
Abstract
In the study of quantum limits to parameter estimation, the high dimensionality of the density operator and that of the unknown parameters have long been two of the most difficult challenges. Here we propose a theory of quantum semiparametric estimation that can circumvent both challenges and produce simple analytic bounds for a class of problems in which the dimensions are arbitrarily high, few prior assumptions about the density operator are made, but only a finite number of the unknown parameters are of interest. We also relate our bounds to Holevo's version of the quantum Cram\'er-Rao bound, so that they can inherit the asymptotic attainability of the latter in many cases of interest. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a pure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Information and Cryptography · Spectroscopy and Quantum Chemical Studies · Quantum Computing Algorithms and Architecture
Quantum Semiparametric Estimation
Mankei Tsang
[email protected] https://blog.nus.edu.sg/mankei/ Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583
Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117551
Francesco Albarelli
Faculty of Physics, University of Warsaw, 02-093 Warszawa, Poland
Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom
Animesh Datta
Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom
Abstract
In the study of quantum limits to parameter estimation, the high dimensionality of the density operator and that of the unknown parameters have long been two of the most difficult challenges. Here we propose a theory of quantum semiparametric estimation that can circumvent both challenges and produce simple analytic bounds for a class of problems in which the dimensions are arbitrarily high, few prior assumptions about the density operator are made, but only a finite number of the unknown parameters are of interest. We also relate our bounds to Holevo’s version of the quantum Cramér-Rao bound, so that they can inherit the asymptotic attainability of the latter in many cases of interest. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a pure state, the purity, or the von Neumann entropy. Potential applications include quantum state characterization for many-body systems, optical imaging, and interferometry, where full tomography of the quantum state is often infeasible and only a few select properties of the system are of interest.
I Introduction
The random nature of quantum mechanics has practical implications for the noise in sensing, imaging, and quantum-information applications Helstrom (1976); Demkowicz-Dobrzański et al. (2015); Paris (2009); *glm2011; *szczykulska16; *pirandola18; *braun18; *pezze18; *albarelli20a; Tsang et al. (2016); Tsang (2019a); Kolobov (1999); *kolobov07; *kolobov_fabre; *jezek; *hradil; *taylor16; *genovese16; *berchera19; *moreau19. To derive their fundamental quantum limits, one standard approach is to compute quantum versions of the Cramér-Rao bound Helstrom (1976); Demkowicz-Dobrzański et al. (2015); Paris (2009); Tsang et al. (2016); Tsang (2019a); Holevo (2011); Hayashi (2017, 2005). In addition to serving as rigorous limits to parameter estimation, the quantum bounds have inspired new sensing and imaging paradigms that go beyond conventional methods Paris (2009); Tsang et al. (2016); Tsang (2019a).
The study of quantum limits has grown into an active research field called quantum metrology in recent years, building on the pioneering work of Helstrom Helstrom (1976) and Holevo Holevo (2011). A major current challenge is the computation of quantum bounds for high-dimensional density operators and high-dimensional parameters, as the brute-force method quickly becomes intractable for increasing dimensions; see Refs. Yuan and Fung (2017); *genoni19; *chabuda20; *fanizza19; Albarelli et al. (2019) for a sample of recent efforts to combat the so-called curse of dimensionality. Most of the existing methods, however, ultimately have to resort to numerics for high dimensions. While numerical methods are no doubt valuable, analytic solutions should be prized higher—as with any study in physics—for their simplicity and offer of insights. Unfortunately, except for a few cases where one can exploit the special structures of the density-operator family Helstrom (1976); Holevo (2011); Tsang et al. (2011); *guta11; Ng et al. (2016); Zhou and Jiang (2019); Tsang (2019b), analytic results for high-dimensional problems remain rare in quantum metrology.
Here we propose a theory of quantum semiparametric estimation that can turn the problem on its head and deal with density operators with arbitrarily high dimensions and little assumed structure. The theory is especially relevant to the estimation of a parameter that can be expressed as a function of the density operator, such as the expectation value of an observable, the fidelity to a given pure state, the purity, or the von Neumann entropy. The density operator is assumed to come from an enormous family, its dimension can be arbitrarily high and possibly infinite, and the unknown “nuisance” parameters have a similar dimension to that of the density operator. Despite the seemingly bleak situation, our theory can yield surprisingly simple analytic results, precisely because of the absence of structure. Our results are ideally suited to scientific applications, such as quantum state characterization Gühne and Tóth (2009); Paris and Rehacek (2004); *lvovsky09; *horodecki; *filip02; *brun04; *flammia11; *enk12; Horodecki (2003) optical imaging Helstrom (1976); Tsang (2019a); Kolobov (1999); Zhou and Jiang (2019); Tsang (2019b), and interferometry Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015); Paris (2009), where the dimensions can be high, the density operator is difficult to specify fully, and it is prudent to assume little prior information.
The theory set forth generalizes the deep and exquisite theory of semiparametric estimation in classical statistics Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), which has seen wide applications in fields such as biostatistics Tsiatis (2006), econometrics Newey (1990), astrostatistics Feigelson and Babu (2012), and, most recently, optical superresolution Tsang (2019c). By necessity, the classical theory involves infinite-dimensional spaces for random variables and makes extensive use of geometric and Hilbert-space concepts. As will be seen later, the operator Hilbert space introduced by Holevo Holevo (2011, 1977) turns out to be the right arena for the quantum case, and the geometric picture of quantum states Hayashi (2017, 2005); Amari and Nagaoka (2000); Uhlmann (1993); *braunstein; *bengtsson; *sidhu20 can provide illuminating insights.
Our formalism is primarily based on Helstrom’s version of the quantum Cramér-Rao bound Helstrom (1976). While this allows us to adapt the classical methods more easily, it is unable to account for the increased errors due to the incompatibility of quantum observables when multiple parameters are involved Holevo (2011); Demkowicz-Dobrzanski et al. (2020). We address this issue by studying also Holevo’s version of the quantum Cramér-Rao bound Holevo (2011) in the semiparametric setting and proving that the two versions turn out to be close. This result enables our bounds to inherit the asymptotic attainability of Holevo’s bound Kahn and Guţă (2009); Gill and Guţă (2013); *yamagata13; *yang19; Demkowicz-Dobrzanski et al. (2020) in many cases of interest.
II Preview of typical results
Before going into the formalism, we present some typical results of the theory to offer motivation.
Suppose that an experimenter has received quantum objects, such as atoms, electrons, photons, or optical pulses, each with the same quantum state . The experimenter would like to estimate a parameter as a function of . Without any knowledge or assumption about , what is the best measurement to perform for the estimation of , and what is the fundamental limit to the precision for any measurement?
The quantum semiparametric theory can provide simple answers to the above questions. For the simplest example, let , where is a given observable, and assume that the estimator is required to be unbiased. For example, one may wish to estimate
the mean position of photons or electrons in optical or electron microscopy, 2. 2.
the mean photon number in an optical mode in optical sensing, imaging, and communication Helstrom (1976), 3. 3.
the mean energy, momentum, or field of quantum particles in particle-physics, condensed-matter, or quantum-chemistry experiments, 4. 4.
a density-matrix element, the fidelity to a target pure state , or an entanglement witness in quantum-information experiments Gühne and Tóth (2009); Paris and Rehacek (2004).
This problem appears in all areas of quantum mechanics Schwartz (2014); *chaikin; *haken; *bravyi19, as most quantum calculations offer predictions in terms of expectation values only, and experiments that aim to estimate the expectation values and verify the predictions with few assumptions about the density operator are in essence semiparametric estimation. The theory here shows that the optimal measurement is simply a von Neumann measurement of the observable of each copy of the objects, followed by an average of the outcomes. For any measurement, the mean-square error of the estimation, denoted by the sans-serif , has a quantum limit given by
[TABLE]
Absent any information about , the separate measurements and the sample mean seem to be the most obvious procedure, but it is not at all obvious that it is optimal, given the infinite possibilities allowed by quantum mechanics.
While Eq. (1) has been derived before via a more conventional method for a finite-dimensional Watanabe et al. (2010); *watanabe11, our theory can also deal with infinite dimensions as well as more advanced examples in quantum information and quantum thermodynamics. For example, if the parameter of interest is the purity , the bound is
[TABLE]
and if the parameter is the relative entropy with respect to a target state , the bound is
[TABLE]
For these two examples, the bounds are asymptotically attainable in principle, at least when is finite-dimensional Kahn and Guţă (2009); Gill and Guţă (2013); *yamagata13; *yang19; Demkowicz-Dobrzanski et al. (2020).
The semiparametric theory is relevant to experiments on many-body quantum systems and quantum simulation Bloch et al. (2008); *georgescu14, because often there is no simple model for , full tomography of is infeasible, and only a few select properties of the system may be of interest. Although a significant literature in quantum information has been devoted to such semiparametric problems Gühne and Tóth (2009); Paris and Rehacek (2004); *lvovsky09; *horodecki; *filip02; *brun04; *flammia11; *enk12; Horodecki (2003), their connections to the classical theory have not yet been recognized. By generalizing the classical theory, this work establishes fundamental limits to the task, indicating the minimum amount of resources needed to achieve a desired precision and also offering a rigorous yardstick for experimental design. This work thus addresses a foundational question by Horodecki Horodecki (2003): “What kind of information (whatever it means) can be extracted from an unknown quantum state at a small measurement cost?” Our work shows that quantum metrology—and quantum semiparametric estimation in particular—offers a viable attack on the question via a statistical notion of efficiency.
An extension of the above scenario is the estimation of given a constraint on . For example, suppose that the quantum state is known to possess a mean energy , where is the Hamiltonian, or attain a fidelity of with respect to another pure state . How may this new information affect the estimation? Write the constraint as , where is an observable and is a given constant. The quantum bound for the example turns out to be
[TABLE]
where denotes the Jordan product. The bound is reduced by the correlation between and .
Another paradigmatic problem in quantum metrology is displacement estimation Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015); Paris (2009), which can be modeled by
[TABLE]
where is the initial state, is a generator, such as the photon-number operator in optical interferometry, and is the displacement parameter to be estimated. Applications range from optical and atomic interferometry to atomic clocks, magnetometry, laser ranging, and localization microscopy Demkowicz-Dobrzański et al. (2015); Paris (2009); Kolobov (1999). If nothing is known about other than a constraint , the quantum bound turns out to be
[TABLE]
where . Our theory can in fact give similarly simple results for a class of such semiparametric problems.
It must be stressed that, apart from the underlying Hilbert space and the constraints discussed above, the experimenter is assumed to know nothing about the density operator, and the bounds here are valid regardless of its dimension. The existing method of deriving such quantum limits is to model with many parameters Hayashi (2017, 2005); Kahn and Guţă (2009); Watanabe et al. (2010), compute a quantum version of the Fisher information matrix, and then invert it. This brute-force method is rarely feasible for problems with high or infinite dimensions. A new philosophy is needed.
In the next sections, we present the theory of quantum semiparametric estimation in increasing sophistication. Sections III and IV generalize the quantum Cramér-Rao bound proposed by Helstrom Helstrom (1976) in a geometric picture. While the picture is not new Hayashi (2005); Amari and Nagaoka (2000), it has so far remained an intellectual curiosity only. Sections III and IV show that it can in fact give simple solutions, such as Eqs. (1)–(3), to a class of semiparametric problems with arbitrary dimensions. Section III establishes the general formalism and also proves results that are valid for finite dimensions, while Sec. IV deals with the infinite-dimensional case via an elegant concept called parametric submodels. In the classical theory, the concept was first adumbrated by Charles Stein Stein (1956) and developed by Levit and many others Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006). Section V further develops the formalism to account for constraints on the density-operator family, in order to produce results such as Eq. (4). An example of entropy estimation in quantum thermodynamics is also discussed there. Section VI discusses some practical problems in optics and summarizes existing results on incoherent optical imaging Tsang (2019a) in the language of quantum semiparametrics, in order to provide a more concrete context for the formalism. Section VII considers semiparametric estimation in the presence of explicit nuisance parameters and studies in particular the problem of displacement estimation with a poorly characterized initial state, in order to produce results such as Eq. (7). To complete the formalism, Section VIII considers a vectoral parameter of interest and Holevo’s version of the quantum Cramér-Rao bound Holevo (2011). There we prove that the Helstrom and Holevo bounds are equal if the parameter of interest is a scalar, and they remain within a factor of two of each other in the vectoral case. The latter fact generalizes a recent result in the parametric setting Carollo et al. (2019); *carollo20. Thus the Helstrom version can inherit the asymptotic attainability of the latter Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020) to within a factor of two.
III Geometric picture of quantum estimation
theory
This section is organized as follows. Section III.1 introduces the Helstrom bound in the conventional formulation. Section III.2 introduces some important Hilbert-space concepts, including the tangent space and the influence operators. Section III.3 generalizes the Helstrom bound in terms of a projection of an influence operator into the tangent space. Section III.4 shows how an influence operator can be derived for a given parameter of interest, while Sec. III.5 proves that the tangent space is simple if the density operator is assumed to be finite-dimensional but otherwise arbitrary. The projection is then straightforward, and Sec. III.5 demonstrates the derivation of Eqs. (1)–(3) as examples.
III.1 Helstrom bound
Let
[TABLE]
be a family of density operators parametrized by , where the superscript denotes the matrix transpose and denotes the dimension of the parameter space . The operators are assumed to operate on a common Hilbert space , with an orthonormal basis
[TABLE]
that does not depend on . Let
[TABLE]
be the dimension of , which may be infinite. The family is assumed to be smooth enough so that any can be interchanged with the operator trace in any operation on . Define , and define a vector of operators as solutions to
[TABLE]
which is shorthand for the system of equations
[TABLE]
is the true parameter value, and all functions of in this section are assumed to be evaluated implicitly at the same . Each is called a symmetric logarithmic derivative in the quantum metrology literature, but here we call it a score, in accordance with the statistics terminology Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006). All vectors are assumed to be column vectors in this paper.
To model a measurement, define a positive operator-valued measure (POVM) on a measurable space , where is the sigma algebra on the set . Let the parameter of interest be a scalar ; generalization for a vectoral will be done in Sec. VIII. Assume an estimator that satisfies
[TABLE]
is called a locally unbiased measurement, as we only require Eqs. (13) to hold at the true . Only local unbiasedness conditions are needed in this paper, and for brevity we will no longer explicitly describe them as local. Define the mean-square estimation error as
[TABLE]
If , a quantum version of the Cramér-Rao bound due to Helstrom Helstrom (1976), denoted by the sans-serif , applies to any unbiased measurement and can be expressed as
[TABLE]
where the Helstrom information matrix is defined as
[TABLE]
The Helstrom bound sets a lower bound on the estimation error for any quantum measurement and any unbiased estimator Helstrom (1976); Holevo (2011); Hayashi (2017, 2005). The estimation of with an infinite-dimensional () is called semiparametric estimation in statistics Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), although the methodology applies to arbitrary dimensions. If is partitioned into , then is called nuisance parameters Tsiatis (2006); Suzuki et al. (2019).
III.2 Hilbert spaces for operators
We now follow Holevo Holevo (2011, 1977) and introduce operator Hilbert spaces in order to generalize the Helstrom bound for semiparametric estimation. The formalism may seem daunting at first sight, but the payoff is substantial, as it simplifies proofs, treats the infinite-dimensional case rigorously, and also enables one to avoid the explicit computation of and for a large class of problems. In the following, we assume familiarity with the basic theory of Hilbert spaces and the mathematical treatment of quantum mechanics; see, for example, Refs. Holevo (2011); Debnath and Mikusiński (2005); Reed and Simon (1980).
All operators considered in this paper are self-adjoint. Consider in the diagonal form with . The support of is , where denotes the closed linear span. is called full rank if . Define the weighted inner product between two operators and as
[TABLE]
and a norm as
[TABLE]
not to be confused with the operator norm . An operator is called bounded if and square summable with respect to if , although all operators are bounded by definition if . For two vectors of operators and , it is convenient to use to denote a matrix with entries
[TABLE]
such as as a Gram matrix.
Define the real Hilbert space for square-summable operators with respect to the true as Holevo (2011, 1977)
[TABLE]
To be precise, each Hilbert-space element is an equivalence class of operators with zero distance between them, viz., . The distinction between an element and its operators is important only if is not full rank; we put a hat on an operator if the distinction is called for. Two important Hilbert-space elements are the identity element and the zero element [math]; sometimes we will substitute for brevity.
Define a subspace of zero-mean operators as
[TABLE]
and the orthocomplement of in as
[TABLE]
In particular, the projection of any into is simply , where denotes the projection map, and
[TABLE]
The most important Hilbert space in estimation theory is the tangent space spanned by the set of scores Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), generalized here as
[TABLE]
is also known as the tangent set. The condition requires the assumption for all ; the zero-mean requirement is satisfied because . A useful relation for any bounded operator is
[TABLE]
via Ref. (Holevo, 2011, Eq. (2.8.88)). Denote also the orthocomplement of in as
[TABLE]
which is useful if a projection of into is desired and is easier to compute, since
[TABLE]
Another important concept in the classical theory is the influence functions Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006), which we generalize by defining the set of influence operators as
[TABLE]
These operators play a major role in Holevo’s formulation of quantum Cramér-Rao bounds Holevo (2011); Ragy et al. (2016), although their connection to the classical concept did not seem to be appreciated before.
III.3 Generalized Helstrom bound
Let the error operator with respect to an unbiased measurement be
[TABLE]
It can be shown (Holevo, 2011, Sec. 6.2) that (as long as ), and also that bounds the estimation error as
[TABLE]
A generalized Helstrom bound (GHB) for any unbiased measurement, denoted by , can then be expressed as
[TABLE]
We call an unbiased measurement efficient if it has an error that achieves the GHB, following the common statistics terminology Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006).
Proofs that Eq. (31) is equal to Eq. (15) if and exists can be found in Refs. Nagaoka (1989); Amari and Nagaoka (2000); Ragy et al. (2016). The following theorem gives a more general expression that is the cornerstone of quantum semiparametric estimation.
Theorem 1**.**
[TABLE]
where , henceforth called the efficient influence, is the unique element in the influence-operator set given by
[TABLE]
and denotes the projection of any influence operator into the tangent space .
Proof.
The proof is similar to the classical one Bickel et al. (1993); Tsiatis (2006). First note that, since , any can always be decomposed into
[TABLE]
This implies , and therefore . Now the Pythagorean theorem gives
[TABLE]
which results in Eq. (32).
To prove the uniqueness of in , suppose that there exists another that gives . Define . Since , , and the Pythagorean theorem yields . This implies that and , contradicting the assumption that . Hence must be unique, and for any results in the same . ∎
Figure 1 illustrates all the Hilbert-space concepts involved in Theorem 1.
Before we apply the theorem to examples, we list a couple of important corollaries. The first corollary reproduces the original Helstrom bound given by Eq. (15) and is expected from earlier derivations; see, for example, Ref. (Hayashi, 2005, Eq. (20) in Chap. 18) and Ref. (Amari and Nagaoka, 2000, Eq. (7.93)). Here we simply clarify that it is a special case of Theorem 1.
Corollary 1**.**
If and exists, the GHB is equal to the original Helstrom bound given by Eq. (15).
Proof.
Delegated to Appendix A. ∎
Note that, unlike Eq. (15), which assumes that consists of linearly independent operators and is invertible, Theorem 1 works with no regard for any linear dependence in . This generalization is in fact indispensable to the semiparametric theory, especially when the concept of parametric submodels is introduced in Sec. IV.
The second corollary, which gives a scaling of the bound with the number of object copies and is easy to prove via , requires more effort to prove if is to be avoided.
Corollary 2**.**
For a family of density operators that model independent and identical quantum objects in the form of
[TABLE]
where the tensor power is defined as the tensor product
[TABLE]
the efficient influence and the GHB are given by
[TABLE]
where is a map defined as
[TABLE]
Proof.
Delegated to Appendix B. ∎
III.4 Influence
operator via a functional gradient
Theorem 1 is useful if an influence operator can be found and is tractable. One way of deriving an influence operator is to assume that the parameter of interest is a functional and consider a derivative of in the “direction” of an operator given by
[TABLE]
Assume that the directional derivative can be expressed as
[TABLE]
in terms of a , hereafter called a gradient of . Any ordinary partial derivative of becomes
[TABLE]
Projecting the gradient into then gives an influence operator, viz.,
[TABLE]
as it is straightforward to check that and . The top flowchart in Fig. 2 illustrates the steps to obtain from . , , and are all gradients that satisfy Eq. (41); the difference lies in the set of directions to which each is restricted. , for instance, is restricted to and orthogonal to , while is restricted to and orthogonal to 111More precisely, is the unique Riesz-Fréchet representation Reed and Simon (1980) of as a continuous linear functional of , is that for , and is that for Reed and Simon (1980); Bickel et al. (1993). The existence of each relies on being continuous with respect to in each domain, so the existence of implies that of and ..
Now consider some examples. The first is for a given (i.e., -independent) observable , which leads to
[TABLE]
The second example is the purity , which leads to
[TABLE]
The final example is the relative entropy Hayashi (2017); Holevo (2012). where and is a given density operator with . The differentiability of is not a trivial question when Holevo (2012), but for it can be done to give
[TABLE]
where uses the fact that is second order in for any (Hayashi, 2017, Theorem 6.3). The von Neumann entropy is a simple variation of this example.
III.5 Projection into the tangent space
The next step is . If the family of density operators is large enough, can fill the entire and the projection becomes trivial. We call a family full-dimensional if its tangent space at each satisfies
[TABLE]
For a specific example, consider the orthonormal basis of given by Eq. (9) and the most general parametrization of for given by Kahn and Guţă (2009)
[TABLE]
where
[TABLE]
and a special entry is removed from the parameters and set as , such that and
[TABLE]
is then given by
[TABLE]
The next theorem is a key step in deriving simple analytic results.
Theorem 2**.**
The family is full-dimensional.
Proof.
Consider the solution to for an . All operators are bounded if . We can then use Eqs. (25) and (53) to obtain
[TABLE]
where is any operator in the equivalence class of . Thus all the diagonal entries of are equal to , and all the off-diagonal entries are zero. In other words, , where is the identity operator. But also means that , resulting in as the only solution. Hence contains only the zero element, and .
∎
implies that the experimenter knows nothing about the density operator, apart from the Hilbert space on which it operates. Despite the high dimension of the family, Theorems 1 and 2 turn the problem into a trivial exercise once an influence operator has been found, since a is already in and hence efficient. Corollary 2 can then be used to extend the result for copies. For , Eq. (44) leads to
[TABLE]
This implies that a von Neumann measurement of of each copy and taking the sample mean of the outcomes are already efficient; no other measurement can do better in terms of unbiased estimation. For , Eq. (45) leads to
[TABLE]
and for , Eq. (46) leads to
[TABLE]
Intriguingly, this expression coincides with the information variance that has found uses in other contexts of quantum information theory, such as quantum hypothesis testing Tomamichel and Hayashi (2013); *li14a; *tomamichel16.
Deriving Eqs. (57)–(59) via the conventional brute-force method would entail the following steps:
Assume the family of density operators given by Eq. (48), with parameters. 2. 2.
Compute the score operators via Eq. (11). 3. 3.
Compute the -by- Helstrom information matrix via Eq. (16). 4. 4.
Compute the inverse . 5. 5.
Compute via Eq. (48), , and the Helstrom bound via Eq. (15).
While this method has been used to produce Eq. (57) Watanabe et al. (2010), it is less clear whether it can easily give Eq. (58) or Eq. (59). Contrast the brute-force method with the proposal here:
Compute the influence operator via a functional derivative of according to Sec. III.4. 2. 2.
Find the tangent space of the density-operator family or the orthocomplement . For example, Theorem 2 shows that is full-dimensional for the family of arbitrary density operators, while Sec. V later shows that may remain tractable for smaller families. 3. 3.
Compute and .
Each step is tractable for all the examples here, regardless of the dimensions.
Equations (57)–(59) are the quantum bounds promised in Sec. II, although they are merely the simplest examples of what the semiparametric methodology can offer, as Secs. V–VII later show.
IV Parametric submodels
The proof of Theorem 2 works only in the finite-dimensional case (). For infinite-dimensional problems, the beautiful concept of parametric submodels Stein (1956); Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006) offers a more rigorous approach. Let
[TABLE]
be a “mother” density-operator family, where may be an infinite-dimensional space. The density operators are still assumed to operate on a common separable Hilbert space . Denote the true density operator in the family as . A parametric submodel is defined as any subset of that contains the true and has the parametric form of Eq. (8). To wit,
[TABLE]
where denotes the dimension of the parameter and denotes the parameter value at which is the truth; both may be specific to the submodel. In the language of geometry Hayashi (2017, 2005); Uhlmann (1993), each is an -dimensional surface in , and all the surfaces are required to intersect at . Figure 3 illustrates the concept.
Each submodel is assumed to be smooth enough for scores to be defined in the same way as before by
[TABLE]
which denotes a system of equations given by
[TABLE]
As everything is evaluated at the true , the scores of all submodels in fact live in the same Hilbert space with respect to . Let the set of all parametric submodels of with respect to the truth be
[TABLE]
where denotes the set of indices that label all the submodels. Define the tangent set as the set of the scores from all such parametric submodels of , viz.,
[TABLE]
and the tangent space as the span of the set, viz.,
[TABLE]
An influence operator is now defined as any operator that satisfies the unbiasedness condition for all submodels with respect to . The condition can be expressed as
[TABLE]
where is specific to each submodel. If in Eq. (28) is taken to mean Eq. (67), then the influence-operator set is still defined by Eq. (28). The error operator given by Eq. (29) for an unbiased measurement still satisfies Eq. (67) by the generic arguments in Ref. (Holevo, 2011, Sec. 6.2), which apply to any submodel, so the error operator remains in , and Eq. (31) still holds. Theorem 1 can now be extended for the mother family.
Theorem 3**.**
The GHB in Eq. (31) for the mother family is given by
[TABLE]
where the efficient influence is the unique element in the influence-operator set given by
[TABLE]
* is any influence operator in , and is the tangent space spanned by the scores of all parametric submodels of .*
Proof.
The proof is identical to that of Theorem 1 if one takes to be the tangent set containing the scores of all parametric submodels. ∎
Corollary 2 can also be generalized in an almost identical way, although the proof requires more careful thought.
Corollary 3**.**
For a family of density operators that model independent and identical quantum objects in the form of
[TABLE]
the efficient influence and the GHB are given by
[TABLE]
where and are those for the family according to Theorem 3 and is the map given by Eq. (39).
Proof.
Delegated to Appendix C. ∎
We now generalize Theorem 2 for infinite-dimensional systems. This is also a more precise generalization of a classic result in semiparametric theory (Bickel et al., 1993, Example 1 in Sec. 3.2).
Theorem 4**.**
, defined as the family of arbitrary density operators, is full-dimensional.
Proof.
We call a Hilbert-space element in bounded and denote it by if its equivalence class contains a bounded operator . Denote the set of all bounded elements in as
[TABLE]
Take any and its bounded operator . Construct a scalar-parameter exponential family as Hayashi (2017, 2005)
[TABLE]
where and the truth is at . As is bounded, is bounded and strictly positive. As is nonnegative and unit-trace, is nonnegative and trace-class (Holevo, 2011, Theorem 2.7.2). Moreover, satisfies the properties
[TABLE]
because is trace-class and is strictly positive. Hence is a valid density operator at any . Since contains arbitrary density operators, is a parametric submodel of . It is straightforward to show that
[TABLE]
so the score for this model can be taken as .
Define a submodel in the same way for every , such that all of the elements are in the tangent set , leading to . As is closed, the limit points of must also be in , and , where is the closure of . Lemma 2 in Appendix D states that is a dense subset of , so
[TABLE]
Together with the fact , this implies , and the theorem is proved.
∎
A comparison of the proofs of Theorems 2 and 4 shows how the parametric-submodel concept works. Instead of dealing with one large family such as Eq. (48), here one exploits the freedom offered by to specify many ad-hoc and elementary submodels. Each submodel in the proof cannot be simpler—the exponential family is simply a type of geodesics through in density-operator space Hayashi (2017). In fact, we do not have to use the exponential family, and other families may also be used as long as they fit the purpose of the proof. An enormous number of submodels are introduced, one for each element in the proof, leading to an extremely overcomplete tangent set. But that presents no trouble for the geometric approach; only the resultant tangent space matters at the end. Figure 4 illustrates the idea.
By virtue of Theorem 4, an influence operator found for a parameter of interest is the efficient one for . The examples in Secs. III.4 and III.5 work for in the same way they work for . If is given by , an influence operator that satisfies Eq. (67) can be found via a gradient of , as shown in Sec. III.4 and Fig. 2. In particular, the influence operators given by Eqs. (44)–(46) and the bounds given by Eqs. (57)–(59) for the various examples should still hold for , although the entropy example may require a more rigorous treatment when Holevo (2012).
V Constrained bounds
V.1 Antiscore operators
Consider a constrained family of density operators defined as
[TABLE]
where denotes a finite set of equality constraints . Such constraints appear often in quantum thermodynamics Jaynes (1957); Gogolin and Eisert (2016). If there exist gradient operators such that, for any ,
[TABLE]
then each operator given by
[TABLE]
satisfies
[TABLE]
and the constraint implies that for all submodels and . In short, we write
[TABLE]
Thus is orthogonal to the tangent set and must be a subset of . We call the antiscore operators, as the following theorem shows that they span in the same way the scores span .
Theorem 5**.**
If exists, for the family.
Proof.
The proof again follows the classical case (Bickel et al., 1993, Example 3 in Sec. 3.2). Let
[TABLE]
In view of Eq. (81),
[TABLE]
Now construct a parametric submodel in terms of each as
[TABLE]
where , is an operator to be specified later, and is defined with respect to the spectral representation of as
[TABLE]
is bounded and positive even if is unbounded, so is a valid density operator. Since , . For a away from with ,
[TABLE]
where Eq. (87) uses Eq. (80) and the last step uses the fact . To make satisfy the constraint , can be set as a function of to cancel the term, with
[TABLE]
Then and is a valid parametric submodel of . Equation (89) also implies that is negligible relative to for infinitesimal , so the score for is , which should be put in the tangent set . As this procedure can be done for any , . Together with Eq. (83), this leads to , giving .
∎
The family given by Eqs. (84) and (85) is more convenient to use here than the exponential family used in the proof of Theorem 4. The defined by Eq. (85) is a generalization of the classical version in Ref. (Bickel et al., 1993, Example 1 in Sec. 3.2) and plotted in Fig. 5. It is designed to give a valid density operator via Eqs. (84)—even if the argument is an unbounded operator—yet produce the desired score when linearized at . An adjustable operator is included in the submodel to make satisfy the constraint away from . Figure 6 further illustrates the idea of the proof.
Given an influence operator , such as those derived in Sec. III.4, the efficient influence and the GHB can be computed in terms of instead of via
[TABLE]
The same projection formula that gives in Appendix A can be adapted to give
[TABLE]
Equations (92) and (93) remain tractable if the constraints are few. The gradients of can be derived in the same way as those of , as shown in Fig. 2, and can be computed analytically for linear constraints, the purity constraint, and the entropy constraint by following the same type of calculations shown in Eqs. (44)–(46). Equation (4) is a special example of the constrained GHB when and .
V.2 Entropy estimation in quantum thermodynamics
In quantum thermodynamics, conserved quantities of a dynamical system, such as the energy and the particle number, are expressed as moment constraints on the density operator with respect to a vector of observables and a vector of constants , viz.,
[TABLE]
Given such constraints, the density operator is often assumed to be the one with the maximum entropy Jaynes (1957), known as the generalized Gibbs ensemble Gogolin and Eisert (2016). Such an assumption, however, requires verification and does not hold out of equilibrium. Experiments on Bose gases have been performed to study the quantum states at different times and the validity of the maximum-entropy principle at steady state Kinoshita et al. (2006); Langen et al. (2015a, b).
When the maximum-entropy principle is in question for those experiments, it is prudent to make no prior assumption about the density operator other than the constraints. Thus one should consider a family of density operators given by Eq. (77), where the vectoral constraint is . Suppose that the von Neumann entropy is the parameter of interest. The estimation of is then a problem of quantum semiparametric estimation.
As the experiments typically involve high-dimensional systems, quantum state tomography is impractical. More efficient estimation of should exist. The formalism here leads to a quantum limit given by
[TABLE]
This bound is equivalent to the Holevo bound, as shown in Sec. VIII, so it is asymptotically attainable in principle, at least for finite-dimensional systems Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020), although the experimental implementation of efficient measurements remains an open question.
As entropy is an excellent measure of randomness and a central quantity in information theory, entropy estimation has many applications beyond thermodynamics. In classical statistics, the semiparametric estimation of entropic quantities is a well studied problem with known near-efficient estimators and applications in universal coding, statistical tests, random-number generation, econometrics, spectroscopy, and even neuroscience Beirlant et al. (1997); *paninski03; *cover. In the quantum domain, one application is universal quantum information compression Jozsa et al. (1998): knowing just the von Neumann entropy and nothing else about allows the quantum information to be compressed in accordance with the entropy. Another application is the estimation of an entropic measure of entanglement, which allows one to demonstrate entanglement without full tomography Gühne and Tóth (2009). The quantum limit here quantifies the minimum amount of resources needed to achieve a desired precision. Its asymptotic attainability suggests that it is a lofty but fair yardstick for experimental design.
V.3 Philosophy
The proposed approach to quantum semiparametric bounds is the polar opposite of the usual approach in quantum metrology. In the usual bottom-up approach, one assumes a small family of density operators with a few parameters and computes that is determined by the overlap between and the scores . Here, one starts with a large family with almost full dimension, computes for an amenable , and then reduces it by that is determined by the overlap between and the antiscores , as illustrated by Fig. 7. The complexity of the problem thus depends on the dimension of the family, and the essential insight of this work is that the problem can become simple again when the dimension is close to being full. Of course, if the dimension of is high, the top-down approach may also suffer from the curse of dimensionality. The medium families with both and in high dimensions are the most difficult to deal with, as they may be impregnable from either end.
V.4 Looser bounds
It may often be the case that, despite one’s best efforts, the exact for a problem remains intractable. Then a standard strategy in statistics and quantum metrology is to sandwich between upper and lower bounds. is an obvious upper bound and can be obtained from the gradient method in Sec. III.4 if can be expressed as a functional . Another way is to use Eq. (30) if an unbiased measurement and its error are known. The evaluation of lower bounds, on the other hand, can be facilitated by the following proposition.
Proposition 1**.**
Let be a closed subspace of and be the orthocomplement of in . Then
[TABLE]
In particular, if
[TABLE]
is taken as the tangent space for a particular parametric submodel , then
[TABLE]
is the GHB for that submodel.
Proof.
Delegated to Appendix E. ∎
A tight lower bound on can be sought by devising a submodel that is as unfavorable to the estimation of as possible. Another approach is to devise an overconstrained model with and evaluate a lower bound on from the top by overshooting, as illustrated by Fig. 8.
VI Examples in optics
VI.1 Quadrature estimation
Here we further illustrate the theory with examples in optics, where quantum measurement theory has found the most experimental success Wiseman and Milburn (2010). For the first and simplest example, let be a density operator of an optical mode and assume the family of arbitrary density operators. Consider the estimation of the mean of a quadrature operator , with . This problem appears often in optical state characterization, communication, and sensing, where is a displacement parameter Paris and Rehacek (2004). The GHB is given by Eq. (57), and homodyne detection of is efficient. Note that this example is different from all previous studies of quadrature estimation Helstrom (1976); Holevo (2011), which assume Gaussian states or similarly low-dimensional parametric models. The semiparametric scenario here allows to be arbitrary and possibly non-Gaussian.
Now suppose that side information concerning another quadrature is available. It follows from Sec. V that the efficient influence is now
[TABLE]
where and are given by Eqs. (5). The GHB is then given by Eq. (4), which is lowered by any correlation between and . From the efficient influence, one may use Eq. (29) to find an efficient measurement, which obeys
[TABLE]
This can be satisfied if the POVM measures the quadrature instead of the obvious . Notice, however, that depends on the unknown . Whether adaptive measurements Wiseman and Milburn (2010) can implement this POVM approximately and whether asymptotic attainability is possible for this infinite-dimensional problem are interesting open questions. One approach may be to form rough estimates of the covariances and via heterodyne detection of a portion of the light first, and then measure the desired quadrature via homodyne detection based on the approximate .
VI.2 Family of classical states
For a more nontrivial example, consider a density-operator family in the form
[TABLE]
where , , is a coherent state, and is the Glauber-Sudarshan function Mandel and Wolf (1995). As is assumed to be positive, is a family of classical states Mandel and Wolf (1995) and a strict subset of . The assumption of instead of is more appropriate for practical applications with significant decoherence, as nonclassical states are unlikely to survive in such an environment.
Consider a moment parameter of the form
[TABLE]
where is a real polynomial of and . For example, one may be interested in the mean of a quadrature, in which case , or the mean energy, in which case . The optical equivalence theorem Mandel and Wolf (1995) gives
[TABLE]
where denotes the normal ordering Mandel and Wolf (1995). It follows from Sec. III.4 that an influence operator is .
The next step is to find the tangent space of . Although is a smaller family than , its dimension turns out to be just as high.
Proposition 2**.**
* is full-dimensional.*
Proof.
Delegated to Appendix F. ∎
With the full-dimensional tangent space, the GHB is also given by Eq. (57). This result shows that the obvious von Neumann measurement of remains efficient in estimating , and no alternative measurements can do better, despite restricting the family to classical states. For example, if is a quadrature, then the homodyne measurement is efficient, and if , then , and the photon-number measurement is efficient.
, the space of positive densities, is infinite-dimensional. The estimation of would be a nonparametric problem Artiles et al. (2005), in contrast with the semiparametric problems studied here. In classical statistics, it is known that a nonparametric estimation of the probability density cannot achieve a parametric convergence rate () Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsybakov (2009), and this difficulty is expected to translate to the quantum domain. Semiparametric estimation, on the other hand, can achieve the parametric rate and is the more feasible task if one is interested in only a few parameters of the system.
A further restriction on the family of can give very different results, as shown in the next section in the context of incoherent imaging.
VI.3 Incoherent imaging
VI.3.1 The mother model
Here we summarize existing results concerning the problem of incoherent imaging Tsang (2019a) using the language of semiparametrics. Unlike previous sections, this section presents no new results essentially. Rather, the goal is to use this very important but equally difficult problem to illustrate the concepts and current limitations of the quantum semiparametric theory.
The basic setup of an imaging system is depicted in Fig. 9. The object is assumed to emit spatially incoherent light at an optical frequency. For simplicity, the imaging system is assumed to be one-dimensional, paraxial, and diffraction-limited. A model of each photon on the image plane is Tsang (2017, 2019b, 2019a)
[TABLE]
where is the unknown source density, is a set of probability densities on , is the object-plane coordinate, is the point-spread function of the imaging system, is the image-plane coordinate normalized with respect to the magnification factor Goodman (2004), is the Dirac position ket that satisfies , and is the canonical momentum operator. and are further assumed to be normalized with respect to the width of so that they are dimensionless. is assumed here to be
[TABLE]
such that is a coherent state. Various generalizations can be found in Refs. Tsang (2017, 2019b, 2019a, 2019c) and references therein. Besides imaging, the model can also be used to describe a quantum particle under random displacements Hall et al. (2009); *vidrighin; *branford19; Ng et al. (2016).
The problem is semiparametric if is infinite-dimensional, such as
[TABLE]
and the parameter of interest is a functional of , such as the object moment
[TABLE]
where denotes the order of the moment of interest. Notice that the family indicated by Eq. (109) is much smaller than the one given by Eq. (103) in the previous example, as the Glauber-Sudarshan function is now separable in terms of and confined to the real axis of , viz.,
[TABLE]
In fact, the dimension of is now infinite, as shown in Appendix G, so this problem is the most difficult type described in Sec. V.3.
The errors and their bounds are all functionals of the true density , and we will focus on their values for subdiffraction distributions, which are defined as those with a width around much smaller than the point-spread-function width, or in other words Tsang (2019a).
VI.3.2 Semiparametric measurements and estimators
Two globally unbiased measurements for semiparametric moment estimation are known Tsang (2019c). For detected photons 222 Refs. Tsang et al. (2016); Tsang (2017, 2019c) use the symbol for the number of detected photons, which is stochastic, and for the expected number of detected photons. For optics models with Poisson statistics, the conditioning on the detected photon number does not introduce any significant difference to the theory., both are separable measurements and sample means in the form of 333In practice, a histogram of the photon counts at the detectors provides sufficient statistics for the estimators and the photons do not need to be resolved individually Tsang (2019c).
[TABLE]
The first measurement is direct imaging, which measures the intensity on the image plane and is equivalent to the projection of each photon in the position basis as
[TABLE]
An unbiased semiparametric estimator is given by the sample mean of
[TABLE]
and the error is
[TABLE]
where denotes a prefactor that does not scale with in the first order. The second measurement is the so-called spatial-mode demultiplexing or SPADE Tsang et al. (2016); Tsang (2017, 2019b, 2019a, 2019c), which demultiplexes the image-plane light in the Hermite-Gaussian basis given by
[TABLE]
where is a Hermite polynomial Olver et al. (2010). For the estimation of an even moment with , the POVM for each photon is
[TABLE]
an unbiased semiparametric estimator is given by the sample mean of
[TABLE]
and the error is
[TABLE]
which is much lower than that of direct imaging in the subdiffraction regime for the second and higher moments. For the estimation of odd moments with SPADE, only approximate results have been obtained so far Tsang (2017, 2018); Zhou and Jiang (2019); Bonsma-Fisher et al. (2019) and are not elaborated here.
Both estimators are efficient for their respective measurements in the classical sense Tsang (2019c). In the quantum case, the question is whether SPADE is efficient or there exist even better measurements. Computing the GHB, or at least bounding it, would answer the question and establish the fundamental quantum efficiency for incoherent imaging.
VI.3.3 Lower bounds via parametric submodels
Both Eqs. (118) and (123) are upper bounds on the GHB. By virtue of Proposition 1, all earlier quantum lower bounds derived for incoherent imaging via parametric models are in fact lower bounds on the GHB for the mother family given by Eq. (106), with the true being evaluated at certain special cases of . References Tsang et al. (2016); Bisketzi et al. (2019); *lupo20a, for example, assume discrete point sources, but exact results become difficult to obtain for a large number of sources. Here we highlight two methods that work for any but can only give looser bounds.
The first method is the culmination of Ref. (Tsang, 2017, Sec. 6) and Ref. (Tsang, 2019b, Appendix C). Assume that
[TABLE]
consists of two sets of parameters and . Define a submodel given by
[TABLE]
The truth is at
[TABLE]
can be rewritten as
[TABLE]
In other words, we have introduced parameters to both the mixing density and the displacement in the model by rewriting the mixture. Appendix H shows how the extended convexity of the Helstrom information Alipour and Rezakhani (2015); Ng et al. (2016) can be used on Eq. (130) to give
[TABLE]
A more careful calculation shows that the SPADE error is exactly equal to this bound for Tsang (2019c). For higher moments, however, Eq. (131) remains much lower than that achievable by SPADE.
The second method, as reported in Ref. Tsang (2019b), considers the formal expansion , which leads to
[TABLE]
Consider this as a parametric submodel with only one scalar parameter for a given , while all the other moments with are fixed. Then the Helstrom bound for this submodel is simply , where is the Helstrom information with respect to . Reference Tsang (2019b) finds via a purification technique that this Helstrom bound is in turn bounded by
[TABLE]
By virtue of Corollary 3 and Proposition 1, we obtain
[TABLE]
This lower bound does match the performance of SPADE in order of magnitude, but it does not have a simple closed-form expression, and the question of whether SPADE is exactly efficient for moments higher than the second remains open.
VII Semiparametric estimation with explicit
nuisance parameters
VII.1 The efficient score operator
We now consider problems where there is an explicit partition of the parameters into a scalar and nuisance parameters that may be infinite-dimensional, viz.,
[TABLE]
An example is the displacement model given by Eq. (6), where is the displacement parameter and the initial state depends on the nuisance parameters. All previous studies of the problem assume that is known exactly. In practice, however, may be poorly characterized, and the estimation performance in the presence of unknown nuisance parameters may suffer as a result.
With the explicit partition of the parameters, the scores can be partitioned similarly. Let be the score with respect to the parameter of interest, as defined by
[TABLE]
where is fixed at the truth. To define the nuisance scores, consider the subfamily
[TABLE]
which holds fixed at the truth instead. Define the nuisance tangent set as the set of scores from all parametric submodels of and the nuisance tangent space as
[TABLE]
The unbiasedness condition for an influence operator becomes
[TABLE]
The second of Eqs. (139) implies that , so if , , and no influence operator that obeys both Eqs. (139) can exist. In that case we assume the GHB to be infinite. Provided that , however, the following theorem provides another method of computing the efficient influence and the GHB.
Theorem 6**.**
Assuming and the unbiasedness condition given by Eqs. (139), the efficient influence and the GHB are given by
[TABLE]
where , henceforth called the efficient score, is given by
[TABLE]
Proof.
Delegated to Appendix I. ∎
Figure 10 illustrates the Hilbert-space concepts involved in Theorem 6. We note that Ref. (Suzuki et al., 2019, Sec. 5) has also arrived at conclusions similar to Theorem 6 in the parametric case, but the crucial point here is the Hilbert-space approach, which will enable us to derive closed-form solutions to semiparametric problems, as shown in the next section.
VII.2 Displacement estimation with a constrained
family of initial states
Consider the displacement model given by Eq. (6) and illustrated by Fig. 11. For high-dimensional systems, only a few moments of the initial state may be known in practice, and it is prudent to assume that is in the constrained family defined by Eq. (77). The density-operator family for the problem can be expressed as
[TABLE]
where the unitary map is defined as
[TABLE]
Generalization for more complicated generators is possible Tsang et al. (2011) but outside the scope of this paper.
Define an inner product and a norm with respect to the true as
[TABLE]
Define also the operator Hilbert space with respect to , the tangent space at with respect to , and the orthocomplement that gives , in the same way as how the spaces , , and are defined with respect to . Noting the unitarity of and following the method in Appendix C, it can be shown that the nuisance tangent space is given by
[TABLE]
Define the map adjoint to by . Exploiting the isomorphism between and , we can compute the efficient score as follows:
[TABLE]
where is the vector of antiscores with respect to , as defined by Eq. (80) but with and instead. Equation (149) can be further simplified, with
[TABLE]
where and is shorthand for . Equation (151) comes from the fact that for the model given by Eq. (143), where is the so-called commutation superoperator defined by Holevo (2011, 1977)
[TABLE]
The final result is
[TABLE]
In particular, if the constraint is linear and a scalar given by
[TABLE]
then
[TABLE]
which gives Eq. (7). is the variance of , while
[TABLE]
is a measure of how sensitive the Heisenberg-picture is to the displacement. An intuitive explanation of this result is as follows. A displacement can be estimated only with respect to a known reference. If only the mean of is known about the initial state, then it is the only reference in the quantum object that is available to the observer. It is therefore not surprising—in hindsight—that the statistics of determine the fundamental limit.
If is the momentum operator and is the position operator satisfying , the Heisenberg picture of is
[TABLE]
which is a quantum additive-noise model with no known statistics about the noise operator other than its mean. Measurements of and the sample mean of the outcomes are efficient. This problem then becomes equivalent to the example, but note that Eqs. (155) and (156) are more general, as they can deal with any generator, a that cannot be easily expressed as a functional of , and more general constraints.
Another example is optical phase estimation with
[TABLE]
and constraint on the mean of the quadrature operators with . There is no phase observable Mandel and Wolf (1995), so expressing as a functional of is difficult if not impossible. Equations (155) and (156), on the other hand, are simple expressions in terms of the generator and the antiscores. In Eqs. (154)–(156), ,
[TABLE]
is simply the covariance matrix of the quadratures, while
[TABLE]
are the mean quadrature values. The efficient influence is a linear combination of the quadratures according to Eq. (154), indicating the ideal, though parameter-dependent, quadrature to be measured. An adaptive measurement can then aim to measure the ideal quadrature to approach the quantum limit.
When is exactly known, the Helstrom bound for displacement estimation has been computed exactly only if is pure or Gaussian. Only looser bounds have been found otherwise Helstrom (1976); Holevo (2011); Demkowicz-Dobrzański et al. (2015). The Mandelstam-Tamm inequality, for example, is looser than the Helstrom bound for mixed states Holevo (2011). is determined by , and if is a high-dimensional non-Gaussian mixed state, is intractable. With the infinitely many nuisance parameters and infinitely many scores assumed here, the problem is hopeless under the conventional bottom-up approach. The top-down geometric approach, on the other hand, is able to avoid the computation of the scores altogether and give a simple result in terms of the more tractable antiscores.
VIII Vectoral parameter of interest
To complete the formalism, here we generalize the core results in this paper for a vectoral parameter of interest with entries. , the dimension of the parameter space, should be at least as large as and may be infinite. Define the error matrix as
[TABLE]
where is an estimator. An influence operator should then be a vector of operators. The inner product between two vectoral operators and the norm are now defined as
[TABLE]
The Hilbert spaces and for the vectoral operators are still expressed as Eqs. (20) and (21), while the tangent space is now defined as the replicating space Tsiatis (2006)
[TABLE]
The set of influence operators is still given by Eq. (28) if is interpreted as for all submodels and . For an unbiased measurement, the error operator given by Eq. (29) remains an element of , and it can be shown (Holevo, 2011, Sec. 6.2) that
[TABLE]
where the matrix inequality means that is positive-semidefinite. The GHB can then be expressed as
[TABLE]
where is a real cost matrix Hayashi (2017). Generalizing Theorems 1 and 3, we have
Theorem 7**.**
The GHB for a vectoral parameter of interest is given by
[TABLE]
where the efficient influence is the unique element in given by
[TABLE]
Proof.
Delegated to Appendix J. ∎
It is straightforward to generalize the methods introduced in this paper to compute the GHB for the vectoral case.
Holevo proposed another bound, denoted in the following by the sans-serif , that can account for the quantum effect of observable incompatibility in multiparameter estimation Holevo (2011); Nagaoka (1989). Before we prove the bound and related results, we need the following lemma.
Lemma 1** (Belavkin and Grishanin Belavkin and Grishanin (1973)).**
For any complex positive-semidefinite matrix ,
[TABLE]
where and denote the entry-wise real and imaginary parts of , respectively, and denotes the trace norm, defined as the sum of the singular values.
Proof.
Provided in Appendix K for completeness. ∎
We can now present the Holevo bound. It requires little modification to be applied to semiparametric estimation; only the definition of needs to be generalized to Eq. (28) here. Otherwise the proof is standard Holevo (2011); Nagaoka (1989); Demkowicz-Dobrzanski et al. (2020); we provide it here simply to demonstrate that it remains valid in the semiparametric setting.
Theorem 8**.**
[TABLE]
where is a complex matrix given by
[TABLE]
Proof.
Holevo proved (Holevo, 2011, Eq. (6.6.55)) that the error matrix and the error operator of any unbiased measurement obeys
[TABLE]
Thus . Applying Lemma 1 and noting that is real, we obtain
[TABLE]
Hence
[TABLE]
∎
The asymptotic attainability of the Holevo bound for has been shown in Refs. Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020). The rough idea there is to consider a two-step method: first find an estimate of using some of the object copies, and then perform a measurement based on the influence operators obtained from the minimization in Eq. (173), assuming to be the truth. In the limit of , the overhead for finding is benign, and it can be shown that the error approaches by local asymptotic normality.
For all the examples studied in previous sections, was a scalar, and it is straightforward to prove that the Holevo bound is equal to the GHB in that case.
Corollary 4**.**
If is a scalar (),
[TABLE]
Proof.
For , and , leading to
[TABLE]
∎
The scalar GHB hence inherits all the properties of the Holevo bound, including its asymptotic attainability. In fact, for any , the Holevo bound turns out to be a marginal improvement over the GHB only.
Theorem 9**.**
[TABLE]
Proof.
For all ,
[TABLE]
As is the infimum of Eq. (181), we obtain , the first inequality of the theorem. The second inequality is proved as follows:
[TABLE]
where Eq. (184) is obtained by applying Lemma 1 to . ∎
The first inequality is well known Holevo (2011); Nagaoka (1989); Ragy et al. (2016). A special case of the second inequality—when , exists, and is the original Helstrom bound—was proved recently in Ref. Carollo et al. (2019); *carollo20. can be attained in special cases Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020).
Theorem 9 implies that the effect of incompatibility is surprisingly benign in the context of asymptotic statistics, the GHB can be approached to within a factor of two if the Holevo bound is attainable, and the GHB is a serviceable alternative to the Holevo bound, especially when the latter is more difficult to compute. See Ref. Demkowicz-Dobrzanski et al. (2020) for further interesting discussions regarding this result.
As an aside, we remark that the in Eq. (183) is called the -invariant bound and coincides with if , where is given by Eq. (153) Holevo (2011); Suzuki (2016); *suzuki19a. In general, offers a tighter upper bound on than but may not be much more difficult to compute, as it also depends on , which can be found via the methods introduced in this work.
We present a few other interesting results concerning multiparameter estimation with in Appendix L.
Finally, we generalize the concept of efficient score in Theorem 6 for a vectoral .
Theorem 10**.**
Assume a density-operator family given by
[TABLE]
Let be the scores with respect to and be the nuisance tangent set. Assume the unbiasedness condition for influence operators given by
[TABLE]
where is the identity matrix. The efficient influence and the GHB are given by
[TABLE]
where the efficient score is given by
[TABLE]
and is assumed to exist.
Proof.
Almost identical to that of Theorem 6 in Appendix I and omitted here for brevity. ∎
IX Conclusion
We have founded a theory of quantum semiparametric estimation and showcased its power by producing simple quantum bounds for a large class of problems with high dimensions and few assumptions about the density operator. The theory establishes the notion of quantum semiparametric efficiency, which should inform and inspire the design of more efficient measurements in many areas of quantum physics.
While the experimental design of efficient semiparametric measurements is only touched upon here and awaits further research, the importance of the quantum limits set forth should not be underestimated. As more experiments are now being performed on complex quantum systems and advantages of such systems for metrology and information processing in general are being claimed, the precision limits serve as ultimate yardsticks as well as “no-go” theorems that guard against spurious proposals and fruitless endeavors, in the same way the laws of thermodynamics impose limits to engines and rule out perpetual-motion machines. Deriving precision limits for highly complex or poorly modeled quantum systems was a daunting task under the curse of dimensionality; the semiparametric theory offers a new way forward.
Many open problems still remain. More extensions and applications of the theory remain to be worked out. The asymptotic attainability of efficiency Hayashi (2017, 2005); Kahn and Guţă (2009); Gill and Guţă (2013); Demkowicz-Dobrzanski et al. (2020) is a thorny issue for infinite-dimensional problems. The assumption of unbiased estimation is a drawback; generalizations to the Bayesian or minimax paradigm Van Trees and Bell (2007); *schutzenberger57; *vantrees; *gill95; *personick71; *hayashi11; *liu16; *chabuda16; *rubio19; *rubio20 should help but await further research. These problems should benefit from studies of alternative quantum bounds beyond the Cramér-Rao type Tsuda and Matsumoto (2005); *glm2012; *qzzb; *qbzzb; *qwwb; *hall_prx; *nair18. In view of Eq. (59) and Figs. 3 and 4, the connections of quantum semiparametrics to other domains of quantum information Tomamichel and Hayashi (2013); *li14a; *tomamichel16 and quantum state geometry Hayashi (2017, 2005); Amari and Nagaoka (2000) are also interesting future directions.
In light of the richness and wide applications of the classical semiparametric theory Ibragimov and Has’minskii (1981); Bickel et al. (1993); Tsiatis (2006); Newey (1990); Feigelson and Babu (2012); Tsang (2019c), this work has only scratched the surface of the full potential of quantum semiparametrics. It should open doors to further useful results.
Acknowledgements.
We thank M. G. Genoni both for several fruitful discussions and for making us aware of Refs. Carollo et al. (2019); *carollo20. We are grateful to R. Nair, M. Guţă, R. Gill, D. Branford, R. Demkowicz-Dobrańzski, J. F. Friel, W. Górecki, and J. Suzuki for useful discussions. This research is partly supported by the National Research Foundation (NRF) Singapore, under its Quantum Engineering Programme (Award QEP-P7). AD and FA have been supported by the UK EPSRC (EP/K04057X/2) and the UK National Quantum Technologies Programme (EP/M01326X/1, EP/M013243/1). FA also acknowledges financial support from the National Science Center (Poland) grant No. 2016/22/E/ST2/00559.
Appendix A Proof of Corollary 1
If and exists, the solution to can be found, for example, in Ref. (Bickel et al., 1993, Eq. (15) in Appendix A.2). Here we give a simple proof for completeness. By definition of the projection Debnath and Mikusiński (2005),
[TABLE]
Any can be expressed as the linear combination with respect to a certain vector . Then
[TABLE]
The solution to the least-squares problem is
[TABLE]
Hence
[TABLE]
which is equal to Eq. (15), since for an influence operator.
Appendix B Proof of Corollary 2
Denote any concept discussed so far with the superscript if it is associated with , but omit the superscript for brevity if . From , we generate a subspace such that
[TABLE]
is a surjective map to by definition of the space. It can be shown that
[TABLE]
so is isomorphic to , and is a unitary map from to Reed and Simon (1980). It can also be shown that
[TABLE]
so , and is isomorphic to . For any , it is not difficult to prove that
[TABLE]
given the isomorphisms. Now let
[TABLE]
where is an influence operator. is also an influence operator, since
[TABLE]
The efficient influence for becomes
[TABLE]
the norm becomes
[TABLE]
and the corollary ensues.
Appendix C Proof of Corollary 3
Let be the tangent set for . For each parametric submodel of , let
[TABLE]
be a parametric submodel of . The score of the submodel is given by
[TABLE]
In other words, each can be used to generate a score in via Eq. (204). The set of scores generated this way is therefore a subset of , viz.,
[TABLE]
Conversely, any parametric submodel of must be in the form of Eq. (203), with being a certain parametric submodel of . The score of the former is then related to the score of the latter via Eq. (204). Since includes the scores of all parametric submodels of , any must be in . Thus , and equality holds, viz.,
[TABLE]
It follows that
[TABLE]
is isomorphic to . Hence, projecting an influence operator of the form into gives the efficient influence , by the same argument as Appendix B.
Appendix D The set of bounded operators
is dense in
To generalize Theorem 2 for the infinite-dimensional case and prove Theorem 4, we need to be mindful of the unbounded operators in . The good news is that they are well defined as limits of bounded-operator sequences in , thanks to Holevo Holevo (2011, 1977); just a minor modification is needed to make his result work for .
Consider the set of bounded elements defined by Eq. (72). If , , since all operators are bounded in the finite-dimensional case, but if , is a strict subset. A useful lemma is as follows.
Lemma 2**.**
.
Proof.
Reference (Holevo, 2011, Theorem 2.8.1) implies that, for any , there exists a Cauchy sequence with each satisfying such that
[TABLE]
To derive a similar convergent sequence in , consider the projection of each into , written as
[TABLE]
Denote a bounded operator in the equivalence class of as . An operator for can be expressed as
[TABLE]
Since and ,
[TABLE]
by the triangle inequality, leading to . The Pythagorean theorem leads to
[TABLE]
which can be combined with Eq. (208) to give
[TABLE]
In other words, , with each , is also Cauchy and converges to . As the argument applies to any , is dense in , and the closure of gives . ∎
Appendix E Proof of Proposition 1
Let the orthocomplement of in be . Then the Pythogorean theorem yields
[TABLE]
where the last step uses Ref. (Bickel et al., 1993, Proposition 3B in Appendix A.2). follows again from the Pythagorean theorem for a . Equation (99) comes from Theorem 1.
Appendix F Proof of Proposition 2
Let be the true density. For real functions on , define an inner product and a norm with respect to as
[TABLE]
Define the Hilbert space of zero-mean functions as
[TABLE]
For each , construct the parametric submodel
[TABLE]
with the truth at and . is the score function with respect to . The score with respect to is then given by
[TABLE]
where the map is a quantum version of the conditional expectation Hayashi (2017). Hence
[TABLE]
Consider the inner product between and an given by
[TABLE]
where Eq. (25) is used and is the adjoint map given by the Husimi representation
[TABLE]
Since ,
[TABLE]
and . The map is obviously linear. It is also bounded because
[TABLE]
Thus is a continuous linear map (Debnath and Mikusiński, 2005, Theorem 1.5.7). As is a dense subset of by virtue of Lemma 2, can be uniquely extended to a continuous linear map on the whole (Debnath and Mikusiński, 2005, Theorem 1.5.10).
Any must obey
[TABLE]
The only solution is . In other words, is in the null space of . As the Husimi representation is injective Jordan (1964); *mehta65, the only solution to is . Hence , and .
Appendix G for diffraction-limited
incoherent imaging is infinite-dimensional
Following Appendix F, it can be shown that if
[TABLE]
for the incoherent-imaging problem in Sec. VI.3. Consider, for example, , where is a momentum eigenket. Then Eq. (230) is satisfied if
[TABLE]
Let be the set of Hermite polynomials that are orthogonal with respect to the weight function . Then any with satisfies Eq. (231). Define the set
[TABLE]
Each obeys Eq. (231) and
[TABLE]
so is an orthogonal set with respect to the inner product given by Eq. (17). As ,
[TABLE]
which means that the dimension of must be infinite.
Appendix H Derivation of Eq. (131)
For the density-operator family given by Eq. (130), the extended convexity of the Helstrom information Alipour and Rezakhani (2015); Ng et al. (2016) implies that
[TABLE]
where is the variance of . With the explicit partition of into and , can be expressed as
[TABLE]
where and . Let
[TABLE]
where is a set of orthogonal polynomials with respect to the true that satisfy . is omitted from because is a score function with respect to and implies that cannot contain in its expansion. The orthonormality of leads to
[TABLE]
Now consider
[TABLE]
Then
[TABLE]
where the completeness property
[TABLE]
is assumed. With
[TABLE]
and using Corollary 3 and Proposition 1, Eq. (131) is obtained.
Appendix I Proof of Theorem 6
The proof follows the classical case Tsiatis (2006). As , the given by Eq. (141) is not zero. Let
[TABLE]
Notice that Eq. (141) is a projection of into a space orthogonal to , so and . Then
[TABLE]
because and each . Thus satisfies Eqs. (139) and is an influence operator. Notice also that and are in , because and . Hence, by Theorem 3,
[TABLE]
and Eq. (251) is the efficient influence.
Appendix J Proof of Theorem 7
We again follow Ref. Tsiatis (2006). Decompose any into
[TABLE]
It is straightforward to prove that . As is orthogonal to any element in , it must be orthogonal to with any in any entry of , say, the th entry. Then
[TABLE]
meaning that each entry of is orthogonal to . This leads to a stronger matrix form of the orthogonality between and given by
[TABLE]
and a matrix form of the Pythagorean theorem given by
[TABLE]
resulting in Eq. (170). To prove the uniqueness of in , suppose that there exists another that gives . Define . As , , and the matrix Pythagorean theorem gives . This implies that , , and , contradicting the assumption that . Hence must be unique.
Appendix K Proof of Lemma 1
Let the superscript denote the entry-wise conjugation of a matrix and the superscript denote the conjugate transpose. means that for any . We also have , since for any . Thus, for any ,
[TABLE]
Let be the eigenvalues and eigenvectors of the Hermitian . As the singular values of are , we obtain
[TABLE]
Appendix L Some results
concerning quantum multiparameter estimation
This appendix presents some interesting results concerning quantum multiparameter estimation, following Sec. VIII and assuming .
A crucial assumption in this paper is that , the set of influence operators, is not empty. While this is not a problem for all the examples studied in this paper, the following theorem, generalizing a classical result by Stoica and Marzetta Stoica and Marzetta (2001), can be used to verify the assumption.
Theorem 11**.**
* is not empty if and only if all the columns of are in the range of the Helstrom information matrix , viz.,*
[TABLE]
where the superscript denotes the Moore-Penrose pseudoinverse Golub and Van Loan (2013).
Proof.
We prove the “only if” part first. Assume that a exists. It satisfies , and therefore
[TABLE]
for any and . The Cauchy-Schwartz inequality gives
[TABLE]
Now suppose that is in the null space of , such that , and pick . We obtain
[TABLE]
which implies . As this must hold for any in the null space of , each column of must be orthogonal to the null space and therefore in the range of . is the projection matrix into the range of Golub and Van Loan (2013), so Eq. (261) holds.
The “if” part comes from the fact that, as long as Eq. (261) holds,
[TABLE]
satisfies and and is therefore an influence operator. ∎
For an illustrative example, consider
[TABLE]
with the geometry depicted in Figure 12. and at the singular point , meaning that
[TABLE]
The tangent space there becomes a line in the direction, and it is impossible for a to satisfy
[TABLE]
if .
If Eq. (261) does not hold at certain values of , Theorem 11 implies that an unbiased estimator of cannot exist there, and the GHB can be assumed to be infinite. Note, however, that a biased estimator may still be able to achieve a finite error.
Provided that Eq. (261) holds, a pseudoinverse form of the Helstrom bound can be obtained.
Corollary 5**.**
If Eq. (261) holds,
[TABLE]
Proof.
Equation (265) is an influence operator and also a linear combination of , so it is in the tangent space . By Theorem 1, it must be efficient. The other results follow from the fact Golub and Van Loan (2013) and the definition of . ∎
The original Helstrom bound is a simple consequence, generalizing the scalar version in Corollary 1.
Corollary 6**.**
If ,
[TABLE]
Proof.
If , exists, , Eq. (261) always holds, and the results follow from Corollary 5. ∎
Finally, we mention that the semidefinite program presented in Ref. Albarelli et al. (2019) to evaluate the Holevo bound for and a nonsingular can be straightforwardly extended to the more general setup considered in this appendix.
Appendix M Post-publication notes
After the completion of this work and its acceptance for publication, Masahito Hayashi informed us that Ref. (Yang et al., 2019, (c)) and Refs. Suzuki et al. (2019); Suzuki (2019b); Suzuki et al. (2020) also study quantum estimation theory with nuisance parameters. In particular, Ref. (Suzuki et al., 2020, Sec. 4.3) independently arrives at results similar to our Theorem 8 and Corollary 6. Reference Suzuki et al. (2020) focuses on the parametric case (), whereas our Theorems 7 and 8 are proven to work in both parametric and semiparametric settings.
We note that our Theorems 7, 8, 9, 11 and Corollaries 4–6 first appear in an arXiv preprint of ours on February 5th, 2020 (Albarelli et al., 2020b, v2). We then decided to merge our two preprints Tsang et al. (2020); Albarelli et al. (2020b) into one manuscript (Tsang et al., 2020, v6), which was accepted by PRX on June 1st, 2020. On the other hand, the first appearance of Sec. 4.3 in Ref. Suzuki et al. (2020) seems to be in the Accepted Manuscript on the JPA website on April 21st, 2020—the section is absent in v1 and v2 of their arXiv preprint Suzuki et al. (2019).
On another note, Ref. (Suzuki et al., 2019, Remark 4.5 in v3) proves that, if and , then there exists a minimizing solution in for the Holevo bound given by Eq. (173).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Helstrom (1976) Carl W. Helstrom, Quantum Detection and Estimation Theory (Academic Press, New York, 1976).
- 2Demkowicz-Dobrzański et al. (2015) Rafał Demkowicz-Dobrzański, Marcin Jarzyna, and Jan Kołodyński, “Quantum Limits in Optical Interferometry,” in Progress in Optics , Vol. 60, edited by E. Wolf (Elsevier, Amsterdam, 2015) Chap. 4, pp. 345–435. · doi ↗
- 3Paris (2009) Matteo G. A. Paris, “Quantum estimation for quantum technology,” International Journal of Quantum Information 07 , 125–137 (2009) . · doi ↗
- 4Giovannetti et al. (2011) Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone, “Advances in quantum metrology,” Nature Photonics 5 , 222–229 (2011) . · doi ↗
- 5Szczykulska et al. (2016) Magdalena Szczykulska, Tillmann Baumgratz, and Animesh Datta, “Multi-parameter quantum metrology,” Advances in Physics: X 1 , 621–639 (2016) . · doi ↗
- 6Pirandola et al. (2018) S. Pirandola, B. R. Bardhan, T. Gehring, C. Weedbrook, and S. Lloyd, “Advances in photonic quantum sensing,” Nature Photonics 12 , 724 (2018) . · doi ↗
- 7Braun et al. (2018) Daniel Braun, Gerardo Adesso, Fabio Benatti, Roberto Floreanini, Ugo Marzolino, Morgan W. Mitchell, and Stefano Pirandola, “Quantum-enhanced measurements without entanglement,” Reviews of Modern Physics 90 , 035006 (2018) . · doi ↗
- 8Pezzé et al. (2018) Luca Pezzé, Augusto Smerzi, Markus K. Oberthaler, Roman Schmied, and Philipp Treutlein, “Quantum metrology with nonclassical states of atomic ensembles,” Reviews of Modern Physics 90 , 035005 (2018) . · doi ↗
