Subdiffraction incoherent optical imaging via spatial-mode   demultiplexing: semiclassical treatment

Mankei Tsang

arXiv:1703.08833·physics.optics·February 28, 2018

Subdiffraction incoherent optical imaging via spatial-mode demultiplexing: semiclassical treatment

Mankei Tsang

PDF

Open Access

TL;DR

This paper provides a semiclassical analysis of a spatial-mode demultiplexing (SPADE) scheme for far-field incoherent optical imaging, demonstrating its superior accuracy over direct imaging in estimating moments of subdiffraction objects.

Contribution

It generalizes SPADE for various point-spread functions and evaluates its error performance in estimating object moments under realistic noise conditions.

Findings

01

SPADE outperforms direct imaging in estimating higher-order moments.

02

The analysis applies to a broad class of point-spread functions.

03

SPADE achieves lower estimation errors near the diffraction limit.

Abstract

I present a semiclassical analysis of a spatial-mode demultiplexing (SPADE) measurement scheme for far-field incoherent optical imaging under the effects of diffraction and photon shot noise. Building on previous results that assume two point sources or the Gaussian point-spread function, I generalize SPADE for a larger class of point-spread functions and evaluate its errors in estimating the moments of an arbitrary subdiffraction object. Compared with the limits to direct imaging set by the Cram\'er-Rao bounds, the results show that SPADE can offer far superior accuracy in estimating the second and higher-order moments.

Equations400

Γ (x, x^{'} ∣ θ)

Γ (x, x^{'} ∣ θ)

f (x ∣ θ)

f (x ∣ θ)

\int d x ∣ ψ (x) ∣^{2}

\int d x ∣ ψ (x) ∣^{2}

θ_{0}

θ_{0}

p_{j} (θ)

p_{j} (θ)

= \int d X \int d x ϕ_{j}^{*} (x) ψ (x - X)^{2} F (X ∣ θ),

j \sum p_{j} (θ) \leq θ_{0} .

j \sum p_{j} (θ) \leq θ_{0} .

\int d x ϕ_{j} (x) ϕ_{k}^{*} (x)

\int d x ϕ_{j} (x) ϕ_{k}^{*} (x)

P (n ∣ θ)

P (n ∣ θ)

τ

τ

E (n_{j})

E (n_{j})

V_{j k} (n)

V_{j k} (n)

N

N

E (\overset{ˇ}{θ}) = θ,

E (\overset{ˇ}{θ}) = θ,

MSE_{μν} (\overset{ˇ}{θ}, θ)

MSE_{μν} (\overset{ˇ}{θ}, θ)

MSE_{μμ} (\overset{ˇ}{θ}, θ)

MSE_{μμ} (\overset{ˇ}{θ}, θ)

CRB (θ)

CRB (θ)

J_{μν} (θ)

J_{μν} (θ)

J_{μν} (θ)

J_{μν} (θ)

J_{μν} (θ)

J_{μν} (θ)

BMSE (\overset{ˇ}{θ})

BMSE (\overset{ˇ}{θ})

BMSE_{μμ} (\overset{ˇ}{θ})

BMSE_{μμ} (\overset{ˇ}{θ})

\equiv (\tilde{J} + K)^{- 1},

\tilde{J}

\tilde{J}

K_{μν}

K_{μν}

θ sup MSE_{μμ} (\overset{ˇ}{θ}, θ)

θ sup MSE_{μμ} (\overset{ˇ}{θ}, θ)

= (\tilde{J} + K)^{- 1} \approx \tilde{J}^{- 1} = J^{- 1} .

= (\tilde{J} + K)^{- 1} \approx \tilde{J}^{- 1} = J^{- 1} .

θ_{μ}

θ_{μ}

f (x ∣ θ)

f (x ∣ θ)

J_{μν} (θ)

J_{μν} (θ)

CRB_{μν} (θ)

CRB_{μν} (θ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Coherence Tomography Applications · Advanced Optical Sensing Technologies · Advanced Fluorescence Microscopy Techniques

Full text

Subdiffraction incoherent optical imaging via spatial-mode

demultiplexing: semiclassical treatment

Mankei Tsang

[email protected] http://mankei.tsang.googlepages.com/ Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117583

Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117551

Abstract

I present a semiclassical analysis of a spatial-mode demultiplexing (SPADE) measurement scheme for far-field incoherent optical imaging under the effects of diffraction and photon shot noise. Building on previous results that assume two point sources or the Gaussian point-spread function, I generalize SPADE for a larger class of point-spread functions and evaluate its errors in estimating the moments of an arbitrary subdiffraction object. Compared with the limits to direct imaging set by the Cramér-Rao bounds, the results show that SPADE can offer far superior accuracy in estimating the second and higher-order moments.

I Introduction

Recent theoretical and experimental studies have shown that far-field optical methods can substantially improve subdiffraction incoherent imaging Tsang et al. (2016a); Nair and Tsang (2016a); Tsang et al. (2016b); Nair and Tsang (2016b); Lupo and Pirandola (2016); Tsang (2018); Ang et al. (2017); Tsang (2017); Krovi et al. (2016); Lu et al. (2016); Tham et al. (2017); Tang et al. (2016); Yang et al. (2016); Paúr et al. (2016); Rehacek et al. (2017a); Yang et al. (2017); Kerviche et al. (2017); Chrostowski et al. (2017); Řehaček et al. (2017); Rehacek et al. (2017b). While most of the prior works focus on two point sources, Ref. Tsang (2017) proposes a spatial-mode demultiplexing (SPADE) measurement technique that can enhance the estimation of moments for arbitrary subdiffraction objects. Although the predicted enhancements are promising for applications in both astronomy and fluorescence microscopy, such as size and shape estimation for stellar objects or fluorophore clusters, researchers in those fields may find it difficult to comprehend the quantum formalism used in Ref. Tsang (2017). One of the main goals of this work is therefore to introduce a more accessible semiclassical formalism that can reproduce the results there, assuming only a background knowledge of statistical optics on the level of Goodman Goodman (2004, 1985) and parameter estimation on the level of Van Trees Van Trees (2001). The formalism incorporates diffraction, photon shot noise, and—most importantly—coherent optical processing, which enables the enhancements proposed in Refs. Tsang et al. (2016a); Nair and Tsang (2016a); Tsang et al. (2016b); Nair and Tsang (2016b); Tsang (2018); Ang et al. (2017); Lu et al. (2016); Tsang (2017); Yang et al. (2017); Rehacek et al. (2017a); Kerviche et al. (2017); Tham et al. (2017); Paúr et al. (2016); Řehaček et al. (2017); Chrostowski et al. (2017); Lupo and Pirandola (2016); Krovi et al. (2016); Tang et al. (2016); Yang et al. (2016); Rehacek et al. (2017b). This treatment thus sheds light on the physical origin of the enhancements, clarifying that no exotic quantum phenomenon is needed to explain or implement them.

As Ref. Tsang (2017) assumes the Gaussian point-spread function (PSF) exclusively, another goal of this work is to generalize the results for a larger class of PSFs via the theory of orthogonal polynomials Rehacek et al. (2017a); Dunkl and Xu (2001), affirming that enhancements remain possible in those cases. To set a benchmark for the proposed method, I derive limits to moment estimation via direct imaging in the form of Cramér-Rao bounds (CRBs) Van Trees (2001); Zmuidzinas (2003); Huber et al. (2013); Feigelson and Babu (2012); Chao et al. (2016); von Diezmann et al. (2017), which are original results in their own right and may be of independent interest to image-processing research Brady (2009); de Villiers and Pike (2016); Pawley (2006); Raginsky et al. (2010); Donoho et al. (1992); Candès and Fernandez-Granda (2014); Schiebinger et al. (2017); Zhu et al. (2012); Bierbaum et al. (2017); Meister (2009). On a more technical level, this work also investigates the estimation bias introduced by an approximation made in Ref. Tsang (2017) and assures that it is harmless.

This paper is organized as follows. Section II introduces the background formalism of statistical optics, measurement noise, and CRBs. Section III presents the bounds for moment estimation via direct imaging of a subdiffraction object. Section IV introduces the theory of SPADE for a general class of PSFs and evaluates its biases and errors for moment estimation, showing that giant accuracy enhancements are possible for the second and higher-order moments. Section V revisits the case of Gaussian PSF studied in Ref. Tsang (2017) and also proposes new exactly unbiased estimators in the case of two dimensions. Section VI presents a Monte Carlo analysis to confirm the theory. Section VII concludes the paper, pointing out open questions and future directions. Appendices A–H deal with mathematical issues that arise in the main text.

II Formalism

II.1 Statistical optics

Consider an object emitting spatially incoherent light, a diffraction-limited imaging system, as depicted in Fig. 1, and the paraxial theory of quasi-monochromatic scalar waves Goodman (2004, 1985). On the image plane, the mutual coherence function, also called the mutual intensity, can be expressed as Goodman (2004, 1985)

[TABLE]

where $x,x^{\prime}\in\mathbb{R}^{D}$ are $D$ -dimensional position vectors on the image plane, $X$ is the object-plane position vector normalized with respect to the magnification factor, $F(X|\theta)$ is the object intensity function, $\theta=(\theta_{0},\theta_{1},\dots)$ is a vector of unknown parameters to be estimated, and $\psi(x)$ is the field PSF. To simplify the notations, I adopt the multi-index notation described in Appendix A and Ref. Dunkl and Xu (2001), such that $D$ can be kept arbitrary, though $D=1$ or $2$ is typical in spectroscopy and imaging. Note that three-dimensional imaging requires a different formalism in the paraxial theory and is outside the scope of this paper. The mean intensity on the image plane is

[TABLE]

which is a basic result in statistical optics Goodman (2004, 1985).

For convenience, I normalize the position vectors with respect to the width of the PSF, such that the PSF width is equal to $1$ in this unit. The PSF is assumed to obey the normalization

[TABLE]

such that

[TABLE]

is the mean optical power reaching the image plane.

Instead of intensity measurement on the image plane, consider the use of further linear optics to process the field followed by photon counting in each output channel, as depicted in Fig. 1. The mean power in each output channel can be expressed as

[TABLE]

where $\phi_{j}^{*}(x)$ is a propagator that couples the image-plane field from position $x$ to the $j$ th output. If the optics after the image plane is passive, power conservation implies that

[TABLE]

This can be satisfied if the set $\{\phi_{j}(x)\}$ is orthonormal, viz.,

[TABLE]

by virtue of Bessel’s inequality Debnath and Shah (2015). If $\{\phi_{j}(x)\}$ is also complete in the Hilbert space of image-plane fields, it becomes an orthonormal basis, and Parseval’s identity leads to equality for Eq. (7) Debnath and Shah (2015). Physically, Eq. (5) implies that each output can be regarded as a projection of the image-plane field in a spatial mode. For example, direct imaging, which measures the spatial intensity on the image plane, can be modeled by taking $\phi_{j}(x)=\sqrt{dx^{(j)}}\delta(x^{(j)}-x)$ , where $x^{(j)}$ is the position of each pixel with infinitesimal area $dx^{(j)}$ , such that $p_{j}(\theta)=f(x^{(j)}|\theta)dx^{(j)}$ . A generalization of the measurement model to deal with mode-dependent losses and non-orthogonal mode projections is possible via the concept of positive operator-valued measures Tsang et al. (2016b) but not needed here.

In superresolution research, it is known that image processing can achieve arbitrary resolution if $f(x|\theta)$ is measured exactly and benign assumptions about the object can be made Brady (2009); de Villiers and Pike (2016); Schiebinger et al. (2017). The caveat is that the techniques are severely limited by noise, so the use of proper statistics is paramount in superresolution studies. For weak incoherent sources, such as astronomical optical sources and microscopic fluorophores, bunching or antibunching is negligible, and it is standard to assume a Poisson model for the photon counts $n=(n_{1},n_{2},\dots)$ at the output channels Tsang et al. (2016b); Goodman (1985); Zmuidzinas (2003); Huber et al. (2013); Pawley (2006); Chao et al. (2016); Van Aert et al. (2002). The Poisson distribution is

[TABLE]

where

[TABLE]

$\eta\in[0,1]$ is the detection efficiency, $T$ is the integration time, and $\hbar\omega$ is the photon energy. The most important statistics here are the mean

[TABLE]

where $\mathbb{E}$ denotes the expectation with respect to $P$ , and the covariance matrix

[TABLE]

which is signal-dependent. If $\{\phi_{j}\}$ is an orthonormal basis, the mean photon number detected by the measurement is

[TABLE]

Conditioned on a total photon number $\sum_{j}n_{j}$ , $n$ obeys multinomial statistics, and the reconstruction of $F$ via direct imaging becomes the density deconvolution problem in nonparametric statistics; see, for example, Ref. Meister (2009) and references therein.

The quantum formalism can arrive at the same Poisson model by assuming that the source is thermal, the mean photon number per spatiotemporal mode is much smaller than 1, and the photon count for each channel is integrated in time over many modes Tsang et al. (2016a); Tsang (2017). That said, an advantage of the semiclassical model besides simplicity is that it applies to any incoherent source that produces Poisson noise at the output, such as incoherent laser sources Goodman (1985) and electron microscopy Van Aert et al. (2002), without the need to satisfy all the assumptions of the quantum model.

II.2 Cramér-Rao bounds (CRBs)

To deal with the signal-dependent nature of Poisson noise, many existing approaches to computational superresolution Brady (2009); de Villiers and Pike (2016); Donoho et al. (1992); Candès and Fernandez-Granda (2014); Schiebinger et al. (2017) are inadequate. A more suitable tool to derive fundamental limits is the CRB, which is now standard in astronomy Zmuidzinas (2003); Huber et al. (2013); Feigelson and Babu (2012) and fluorescence microscopy Chao et al. (2016); von Diezmann et al. (2017). For any estimator $\check{\theta}(n)$ that satisfies the unbiased condition

[TABLE]

the mean-square error matrix is equal to its covariance, viz.,

[TABLE]

and the CRB is Van Trees (2001); Zmuidzinas (2003); Huber et al. (2013); Feigelson and Babu (2012); Chao et al. (2016); von Diezmann et al. (2017)

[TABLE]

where

[TABLE]

is the inverse of the Fisher information matrix defined as

[TABLE]

An unbiased estimator whose error attains the CRB is called efficient. In the limit of infinite trials, the maximum-likelihood estimator is asymptotically unbiased and efficient Van Trees (2001), so the bound is also useful as a measure of the achievable error in the asymptotic limit.

For the Poisson model, the Fisher information is

[TABLE]

For example, the information for direct imaging with infinitesimal pixel size is

[TABLE]

The data-processing inequality Zamir (1998) ensures that increasing the pixel size, or any processing of the image-plane intensity in general, cannot increase the amount of information. A simple extension of Eq. (19) for strong thermal sources with super-Poisson statistics can be found in Appendix C of Ref. Yang et al. (2017).

An intuitive way of understanding Eq. (19) is to regard it as a signal-to-noise ratio: each derivative $\partial p_{j}/\partial\theta_{\mu}$ measures the sensitivity of an output to a parameter, while the denominator $p_{j}$ is proportional to the Poisson variance and indicates the noise level. The form of Eq. (19) hence suggests that any parameter-insensitive background in $p_{j}$ should be minimized. The nonlinear dependence of the Fisher information on $p_{j}$ complicates the analysis, but also hints that coherent optical processing may lead to nontrivial effects.

The Bayesian CRB (BCRB) can be used to set more general limits for any biased or unbiased estimator Schützenberger (1957); Van Trees (2001); Gill and Levit (1995); Tsang (2018); Van Trees and Bell (2007). Define the Bayesian mean-square error as

[TABLE]

where $\Pi(\theta)$ is a prior probability density. For a prior that vanishes on the boundary of its domain, the BCRB is

[TABLE]

where

[TABLE]

is the Fisher information averaged over the prior and

[TABLE]

is the prior information. Other Bayesian bounds for more general priors can be found in Ref. Van Trees and Bell (2007). The BCRB also applies to the worst-case error $\sup_{\theta}\textrm{MSE}_{\mu\mu}(\check{\theta},\theta)$ for minimax estimation Tsang (2018); Gill and Levit (1995), since

[TABLE]

for any $\Pi(\theta)$ , and the prior can be chosen to tighten the bound Tsang (2018); Gill and Levit (1995).

The BCRB is close to the CRB if $J(\theta)$ is constant in the domain of the prior, such that $\tilde{J}=J$ , and the prior information $K$ is negligible relative to $\tilde{J}$ , such that

[TABLE]

A counterexample is the problem of two-point resolution Tsang (2018), where $J$ vanishes at a point in the parameter space and the BCRB becomes very sensitive to the choice of prior, as mentioned later in Sec. III.2.

III Limits to direct imaging

III.1 Error bounds

Define the object moments

[TABLE]

as the parameters of interest. Note that the moments are unnormalized, unlike the definition in Ref. Tsang (2017). Under general conditions, the set of moments uniquely determine $F$ Dunkl and Xu (2001), so there is little loss of generality with this parameterization. I will focus on moment estimation hereafter and not the pointwise reconstruction of $F$ , however, for two reasons: the moments are more directly related to many useful parameters in practice, such as the brightness, location, size, and shape of an object Nicovich et al. (2017); Feigelson and Babu (2012), while the reconstruction of $F$ without further prior information is ill-posed and a forlorn task in practice when noise is present Brady (2009); de Villiers and Pike (2016); Donoho et al. (1992); Candès and Fernandez-Granda (2014); Meister (2009), even with the techniques introduced in this work.

Expanding $|\psi(x-X)|^{2}$ in a Taylor series, the mean image given by Eq. (2) can be expressed in terms of $\theta$ as

[TABLE]

The Fisher information given by Eq. (20) becomes

[TABLE]

Appendix B shows that this can be inverted analytically to give

[TABLE]

where $N$ is the mean photon number given by Eq. (13),

[TABLE]

is the normalized image moment matrix, the $C$ matrix is defined as

[TABLE]

and

[TABLE]

is a moment of the PSF. The lower-triangular property of $C$ indicated by Eq. (38) means that $C^{-1}$ is also lower-triangular and the low-order elements of the CRB can be computed from a finite number of low-order elements of $M$ and $C$ . An unbiased and efficient estimator is described in Appendix C.

To proceed further, I focus on the subdiffraction regime, which I define as the scenario where the object support width $\Delta$ is much smaller than the PSF width. To be specific, the width is defined by

[TABLE]

and the subdiffraction regime is defined by the condition

[TABLE]

in the dimensionless unit assumed here. This can be regarded as the extreme opposite to the sparse regime commonly assumed in compressed sensing Raginsky et al. (2010); Donoho et al. (1992); Candès and Fernandez-Granda (2014); Schiebinger et al. (2017); Zhu et al. (2012) and can be ensured by prior information in practice. For example, a spot that resembles the PSF in a prior image indicates a subdiffraction object and can be studied further via the framework here; such spots are of course commonly found in both astronomical and microscopic imaging. In fluorescence microscopy, the subdiffraction support can even be enforced via stimulated-emission depletion (STED) Hell (2007), and the theory here can help STED microscopy gain more information about each spot beyond $\theta_{0}$ .

In the subdiffraction regime, the moments observe a magnitude hierarchy with respect to the order $|\mu|$ , as

[TABLE]

and I can combine Eqs. (29), (32), and (39) to obtain

[TABLE]

In other words, the image is so blurred that it resembles the PSF to the zeroth order, and the image moments approach those of the PSF. The CRB hence becomes

[TABLE]

This is the central result of Sec. III.

To set a more general limit for any biased or unbiased estimator, consider the BCRB described in Sec. II.2. Since the Fisher information given by the inverse of Eq. (47) depends only on $\theta_{0}$ and not the other parameters to the leading order, the average information $\tilde{J}$ defined by Eq. (24) is relatively insensitive to the choice of prior in the subdiffraction regime. For any reasonable prior that gives a finite prior information $K$ , a long enough integration time can then make $\tilde{J}$ much larger than $K$ in Eq. (23), leading to $\textrm{BCRB}\approx\textrm{CRB}$ , if $\theta_{0}$ is replaced by a suitable prior value. The two bounds hence give similar results here in the asymptotic limit. Figure 2 summarizes the relationships among the various quantities defined for direct imaging in this section.

III.2 Special cases

The low-order elements of Eqs. (30) and (31) can be used to reproduce a few well known results. For example, the CRB with respect to $\theta_{0}$ can be derived from Eq. (31) and is given by

[TABLE]

which is equal to the textbook result. Another example is point-source localization Chao et al. (2016); Huber et al. (2013), for which known results can be retrieved from Eq. (30) by defining the location parameters as $\theta_{\mu}/\theta_{0}$ for $|\mu|=|\nu|=1$ . To see this, assume $D=1$ for simplicity, and the information with respect to $X=\theta_{1}/\theta_{0}$ in the $\Delta\to 0$ , $f(x|\theta)\to\theta_{0}|\psi(x)|^{2}$ limit becomes

[TABLE]

which is exact for one point source Chao et al. (2016); Huber et al. (2013).

Considering $|\mu|=|\nu|=2$ , Eq. (30) can also reproduce the results in Refs. Tsai and Dunn (1979); Bettens et al. (1999); Van Aert et al. (2002); Ram et al. (2006) regarding sub-Rayleigh two-point separation estimation. To see this, assume $D=1$ again and that the centroid of the two point sources is at the origin. The second moment is then related to the separation $d$ by $\theta_{2}=\theta_{0}d^{2}/4$ . The information with respect to $d$ becomes

[TABLE]

This can be compared with a direct calculation of the information by considering the mean image

[TABLE]

and approximating it for sub-Rayleigh $d\ll 1$ as Bettens et al. (1999); Van Aert et al. (2002); Yang et al. (2017)

[TABLE]

The information is then

[TABLE]

which coincides with Eq. (50). The vanishing $J^{(d)}$ and divergent $\textrm{CRB}^{(d)}=1/J^{(d)}$ for $d\ll 1$ were first reported in Refs. Tsai and Dunn (1979); Bettens et al. (1999); Van Aert et al. (2002); Ram et al. (2006) and called Rayleigh’s curse in Ref. Tsang et al. (2016a). The BCRB becomes very sensitive to the choice of prior and produces a markedly different result from the CRB when applied to the worst-case error Tsang (2018). This issue depends on the parameterization Gill and Levit (1995) and does not arise for the moment parameters, however.

In the absence of a specific parametric model or equality parameter constraints Gorman and Hero (1990), the full information matrix should be considered, and the CRB given by Eq. (31), which results from inverting the full information matrix, is a tighter limit Van Trees and Bell (2007) for general objects. Appendix D presents a limit of Eq. (31) when diffraction can be ignored, while Eq. (47) should be used in the subdiffraction regime.

This section has established fundamental limits to direct imaging in the subdiffraction and shot-noise-limited regime. The next sections show that coherent optical processing can beat them.

IV Spatial-mode demultiplexing (SPADE)

IV.1 Point-spread-function-adapted (PAD) basis

References Tsang et al. (2016a); Nair and Tsang (2016a); Tsang et al. (2016b); Nair and Tsang (2016b); Tsang (2018); Ang et al. (2017); Lu et al. (2016); Tsang (2017); Yang et al. (2017); Rehacek et al. (2017a); Kerviche et al. (2017); Tham et al. (2017); Paúr et al. (2016); Řehaček et al. (2017); Chrostowski et al. (2017); Lupo and Pirandola (2016); Krovi et al. (2016); Tang et al. (2016); Yang et al. (2016); Rehacek et al. (2017b) have shown that SPADE, a technique of linear optics and photon counting with respect to a judiciously chosen basis of spatial modes, can substantially improve subdiffraction imaging. To generalize the use of the TEM basis in Ref. Tsang (2017), I consider the point-spread-function-adapted (PAD) basis proposed by Rehacek et al. for the two-point problem Rehacek et al. (2017a) and apply it to more general objects. Denote the PAD basis by

[TABLE]

where the spatial modes are more conveniently defined in the spatial-frequency domain. Defining

[TABLE]

$\Phi_{q}(k)$ can be expressed as

[TABLE]

where $\{g_{q}(k);q\in\mathbb{N}_{0}^{D}\}$ is a set of real orthogonal polynomials with $|\Psi(k)|^{2}$ as the weight function Dunkl and Xu (2001), $G$ is an invertible matrix that satisfies the lower-triangular property

[TABLE]

and the indices follow a total and degree-respecting order that obeys

[TABLE]

See Appendix B for more details about orthogonal polynomials. The polynomials are assumed to satisfy the orthonormal condition

[TABLE]

which also ensures that $\{\phi_{q}\}$ is orthonormal. The completeness of $\{\phi_{q}\}$ can be proved along the lines of Ref. Dunkl and Xu (2001) but is not essential here. As $\phi_{0}(x)=\psi(x)$ and each higher-order mode in real space is a sum of $\psi(x)$ derivatives given by

[TABLE]

the PAD basis can be regarded as a generalization of the binary SPADE concept in Ref. Tsang et al. (2016a) and the derivative-mode concept in Ref. Paúr et al. (2016).

In terms of the PAD basis, I can define a mutual coherence matrix as

[TABLE]

In particular, SPADE in terms of the PAD basis gives a set of output channels with powers

[TABLE]

and the Poisson photon counts $\{n_{q};q\in\mathbb{N}_{0}^{D}\}$ have expected values

[TABLE]

where $\eta_{0}$ is the efficiency of the PAD-basis measurement. An unbiased estimator of $\Gamma_{qq}$ is

[TABLE]

and its variance is

[TABLE]

In the context of the Gaussian PSF, Refs. Yang et al. (2016); Tsang (2017) found that $p_{q}(\theta)$ is sensitive only to some of the object moments. To estimate the other moments, Ref. Tsang (2017) further proposes measurements that access the off-diagonal elements of $\Gamma$ . To measure an off-diagonal $\Gamma_{qq^{\prime}}$ , take two spatial modes with indices $q$ and $q^{\prime}$ from the PAD basis and interfere them, such that the outputs correspond to projections into the spatial modes

[TABLE]

which I call interferometric-PAD (iPAD) modes. The powers at the two outputs are

[TABLE]

The photon counts, denoted by $n_{qq^{\prime}}^{+}$ and $n_{qq^{\prime}}^{-}$ , have expected values

[TABLE]

where $\eta_{s}$ denotes the efficiency of the measurement that includes these two projections. Assume further that $|\Psi(k)|^{2}$ is centrosymmetric, as defined by

[TABLE]

such that $G$ , $h_{q}(X)$ , and $\Gamma_{qq^{\prime}}$ are all real, as shown in Appendix E and assumed hereafter. An unbiased estimator of $\Gamma_{qq^{\prime}}$ is then

[TABLE]

with

[TABLE]

The estimators $\check{\Gamma}_{qq^{\prime}}$ given by Eqs. (67) and (75) will be used in Sec. IV.2 to construct moment estimators.

Since the iPAD modes are not orthogonal to the PAD modes, they cannot belong to the same orthonormal basis. This means that, if projections into both PAD and iPAD modes are desired, multiple measurements in different bases are needed and must be performed on different photons. This can be done either sequentially in time via configurable interferometers or on different beamsplitted parts of the light. If each measurement has an efficiency $\eta_{s}$ , energy conservation mandates that

[TABLE]

IV.2 Moment estimation

To relate $\Gamma$ to the object moments, use Eqs. (55)–(57) to rewrite the propagator $h_{q}(X)$ in Eq. (64) as

[TABLE]

where

[TABLE]

as shown in Appendix E. Since $G^{-1}$ and $G$ are lower-triangular, $H$ and $H^{-1}$ are upper-triangular, satisfying

[TABLE]

Substituting Eq. (80) into Eq. (63), $\Gamma_{qq^{\prime}}$ can be related to the moments by

[TABLE]

which shows that each $\Gamma_{qq^{\prime}}$ is sensitive to a combination of moments with orders at least as high as $|q+q^{\prime}|$ . Given the magnitudes of $\theta$ according to Eq. (42), the magnitude of $\Gamma_{qq^{\prime}}$ can be expressed as

[TABLE]

and the variances of the estimators given by Eqs. (68) and (76) become

[TABLE]

Equations (86) and (87) will be used to evaluate the errors of moment estimation.

Instead of computing the CRB and relying on asymptotic arguments, here I construct explicit moment estimators and evaluate their errors directly to demonstrate the achievable performance of SPADE. To begin, consider the inverse of Eq. (85) given by

[TABLE]

which implies that an unbiased estimator of $\theta_{q+q^{\prime}}$ can be constructed from unbiased estimators of $\Gamma_{pp^{\prime}}$ given by Eqs. (67) and (75), viz.,

[TABLE]

This estimator may not be realizable, however, as it may not be possible to group the needed projections into a reasonable number of bases. A fortuitous exception occurs for the Gaussian PSF, as elaborated later in Sec. V.3.

To find a simpler estimator, I focus on the class of separable PSFs given by

[TABLE]

where each $|\Psi^{(j)}(k_{j})|^{2}$ is a one-dimensional function. Defining

[TABLE]

as the orthogonal polynomials with respect to each $|\Psi^{(j)}(k_{j})|^{2}$ , the natural orthogonal polynomials in the multivariate case are their products, viz.,

[TABLE]

As each $G_{q_{j}r_{j}}^{(j)}$ is lower-triangular, I obtain the condition

[TABLE]

It follows from Eqs. (82) and (83) that $H$ and $H^{-1}$ are also separable and given by

[TABLE]

Using the property

[TABLE]

I can rewrite the sums in Eq. (89) as

[TABLE]

and obtain

[TABLE]

which consists of one $\theta_{0}O(\Delta^{|q+q^{\prime}|})$ term and higher-order terms, as ranked by Eq. (86). To evaluate the magnitude of the higher-order terms, note that, for a centrosymmetric $|\Psi(k)|^{2}$ , $(H^{-1})_{qr}\propto G_{rq}=0$ if $|r|=|q|+1,|q|+3,\dots$ Dunkl and Xu (2001), so

[TABLE]

which is smaller than the leading-order term by two orders of magnitude. A simplified estimator, involving only one $\check{\Gamma}_{qq^{\prime}}$ , can then be constructed as

[TABLE]

where the last step uses the fact $(H^{-1})_{qq}=1/H_{qq}$ for a triangular matrix. The bias is then the negative of Eq. (100), viz.,

[TABLE]

Figure 3 summarizes the relationships among the various quantities defined in this section, while Appendix G discusses a generalization of the estimator for non-separable PSFs.

Given Eq. (87), the variance of the estimator is

[TABLE]

To minimize the variance for a given moment $\theta_{\mu}$ with $\mu=q+q^{\prime}$ , $\min(|q|,|q^{\prime}|)$ should be made as high as possible. This can be accomplished by choosing

[TABLE]

The alternating floor ( $\left\lfloor\right\rfloor$ ) and ceil ( $\left\lceil\right\rceil$ ) operations keep $|q|$ high without exceeding $|q^{\prime}|$ . If $|\mu|$ is even, $\mu$ has an even number of odd elements, then $|q|=|q^{\prime}|=|\mu|/2$ . If $|\mu|$ is odd, $\mu$ has an odd number of odd elements, then $|q|=(|\mu|-1)/2$ and $|q^{\prime}|=(|\mu|+1)/2$ . Hence one can achieve

[TABLE]

and the mean-square error becomes

[TABLE]

Compared with the CRB for direct imaging given by Eq. (47), Eq. (112) can be much lower in the $\Delta\ll 1$ subdiffraction regime if $|\mu|\geq 2$ , the bias is negligible, and $\eta_{s}$ is on the same order of magnitude as the direct-imaging efficiency. This is the central result of Sec. IV. The conclusion holds also from the Bayesian or minimax perspective, since the BCRB for direct imaging is close to the CRB in the asymptotic limit, as argued in Sec. III.1, while Eq. (112) also applies to the Bayesian or worst-case error for SPADE if $\theta_{0}$ is replaced by a suitable prior value.

A heuristic explanation of the enhancements is as follows. Recall that Poisson noise is signal-dependent, and any background in the signal increases the variance. In the subdiffraction regime, the direct image is so blurred that it resembles the PSF $|\psi(x)|^{2}$ , and the fundamental mode $\phi_{0}(x)=\psi(x)$ acts as a background and the main contributor of noise. With SPADE, on the other hand, each moment estimator is designed to use spatial modes with the highest possible orders. The isolation from the lower-order modes, including the fundamental, substantially reduces the background and improves the signal-to-noise ratio.

IV.3 Multi-moment estimation

The remaining question is the number of bases needed to estimate all moments. For $D=1$ , three bases are enough: a measurement in the PAD basis provides

[TABLE]

where $2\mathbb{N}_{0}=\{0,2,4,\dots\}$ , a measurement in the basis $\{\varphi_{q,q+1}^{\pm}(x);q\in 2\mathbb{N}_{0}\}$ provides

[TABLE]

where $4\mathbb{N}_{0}+1=\{1,5,9,\dots\}$ , and a measurement in the basis $\{\varphi_{q,q+1}^{\pm}(x);q\in 2\mathbb{N}_{0}+1\}$ provides

[TABLE]

where $2\mathbb{N}_{0}+1=\{1,3,5,\dots\}$ and $4\mathbb{N}_{0}+3=\{3,7,11,\dots\}$ . If the light is split for measurements in all three basis, the condition of energy conservation given by Eq. (77) implies

[TABLE]

For $D=2$ , seven bases—defined by Table 1 and illustrated by Fig. 4—can do the job. I call these bases PAD and iPAD1–iPAD6, which generalize the TEM and iTEM1–iTEM6 bases proposed in Ref. Tsang (2017) for the Gaussian PSF. Energy conservation now implies

[TABLE]

if measurements in all the seven bases are performed. The essential point is that the penalty in efficiency for multi-moment estimation is only a constant factor, and significant enhancements over direct imaging remain possible.

IV.4 Criterion for informative estimation

A word of caution is in order: even with SPADE, there are severe resolution limits. This is because the moments are inherently small parameters in the subdiffraction regime according to Eq. (42), and the error needs be much smaller than the prior range of the parameter for the estimation to be informative. To evaluate the usefulness of an estimation relative to prior information, I adopt the Bayesian perspective Van Trees (2001); Van Trees and Bell (2007); Berger (1985) and consider the Bayesian error given by Eq. (21). In the absence of measurements, the error is determined by the prior and given by

[TABLE]

where $\mathbb{E}^{(\Pi)}$ denotes the expectation with respect to $\Pi(\theta)$ , the upper bound comes from Eq. (42), and $\theta_{0}$ is assumed to be given for simplicity. Using the bound as a conservative estimate of the prior error, a rule of thumb for informative estimation is

[TABLE]

The small prior error places a stringent requirement on the post-measurement error. For direct imaging, assuming the asymptotic limit where the BCRB is close to the CRB given by Eq. (47), the fractional BCRB is

[TABLE]

This value grows exponentially with the order $|\mu|$ , meaning that the estimation of higher-order moments requires exponentially more photons to become informative.

For SPADE, an achievable Bayesian error can be obtained by averaging $\textrm{MSE}(\check{\theta}_{\mu}^{\prime},\theta_{\mu})$ , and the magnitude is also given by Eq. (112). The fractional error becomes

[TABLE]

The $O(\Delta^{4})$ relative bias is always much smaller than $1$ , but the fractional variance still grows with $|\mu|$ exponentially. Compared with direct imaging, the exponent is reduced for $|\mu|\geq 2$ and not as many photons are needed to achieve a small fractional error for a given moment, but higher-order moments remain more difficult to estimate.

This consideration suggests that SPADE is most useful for scenarios that depend on only a few low-order moments. For example, the two-point problem studied in Refs. Tsang et al. (2016a); Nair and Tsang (2016a); Tsang et al. (2016b); Nair and Tsang (2016b); Tsang (2018); Ang et al. (2017); Lu et al. (2016); Yang et al. (2017); Kerviche et al. (2017); Tham et al. (2017); Paúr et al. (2016); Rehacek et al. (2017a); Chrostowski et al. (2017); Lupo and Pirandola (2016); Krovi et al. (2016); Tang et al. (2016); Yang et al. (2016); Rehacek et al. (2017b) requires moments up to the second order only Tsang (2017), the case of two unequal sources studied in Refs. Řehaček et al. (2017); Rehacek et al. (2017b) requires moments up to the third, and parametric object models with size and shape parameters Tsang (2017); Bierbaum et al. (2017) can also be related to low-order moments.

V Gaussian point-spread function

V.1 Direct imaging

For an illustrative example of the general theory, consider the Gaussian PSF

[TABLE]

which is a common assumption in fluorescence microscopy Chao et al. (2016); Zhang et al. (2007). The Hermite polynomials can be used to compute the CRB in the limit of $\Delta\to 0$ , as shown in Appendix F. The result is

[TABLE]

which coincides with the $D=2$ theory in Ref. Tsang (2017).

V.2 SPADE

The PSF in the spatial-frequency domain is

[TABLE]

A set of orthogonal polynomials with respect to $|\Psi(k)|^{2}$ are defined by

[TABLE]

and the PAD mode functions become

[TABLE]

The PAD basis in this case is simply the TEM basis, as expected. The propagator given by Eq. (64) can be computed analytically with the help of the generating function for Hermite polynomials DLMF ; Olver et al. (2010); the result is

[TABLE]

The mutual coherence matrix $\Gamma$ defined by Eq. (63) becomes

[TABLE]

Unbiased estimators of $\Gamma_{qq^{\prime}}$ can be constructed from projections in the PAD and iPAD spatial modes according to Eqs. (67) and (75); the iPAD modes are called iTEM modes in Ref. Tsang (2017). The estimator variances are given by Eqs. (68) and (76), with magnitudes given by Eq. (87).

To estimate a given moment $\theta_{\mu}$ , $q$ and $q^{\prime}=\mu-q$ can be chosen according to Eq. (108), the simplified estimator given by Eq. (101) can be used, and the error then agrees with Eq. (112). These results again agree with Ref. Tsang (2017), except that Ref. Tsang (2017) neglects the contribution of bias to the mean-square error and therefore does not include the second term in Eq. (112).

V.3 Exactly unbiased estimator

For $D=2$ , the PAD and iPAD1–iPAD6 bases described by Table 1 and Fig. 4 become the TEM and iTEM1–iTEM6 bases proposed in Ref. Tsang (2017), and the estimator given by Eq. (101) is equivalent to the ones proposed in Ref. Tsang (2017). Interestingly, it is possible to go further than Ref. Tsang (2017) and construct exactly unbiased moment estimators from these measurements. First note that Eq. (130) offers a shortcut to express each moment in terms of $\Gamma$ as follows:

[TABLE]

Combining Eqs. (101) and (134), it can then be shown that the estimator

[TABLE]

is exactly unbiased. To construct

[TABLE]

one simply needs $\left\{\check{\theta}_{\mu}^{\prime};\mu\in(2\mathbb{N}_{0})\times(2\mathbb{N}_{0})\right\}$ from the PAD basis. To construct

[TABLE]

one needs $\left\{\check{\theta}_{\mu}^{\prime};\mu\in(2\mathbb{N}_{0}+1)\times(2\mathbb{N}_{0})\right\}$ , which can be obtained from the iPAD1 and iPAD4 bases. Similarly, to construct

[TABLE]

one needs $\left\{\check{\theta}_{\mu}^{\prime};\mu\in(2\mathbb{N}_{0})\times(2\mathbb{N}_{0}+1)\right\}$ , which can be obtained from the iPAD2 and iPAD5 bases. Finally, to construct

[TABLE]

one needs $\left\{\check{\theta}_{\mu}^{\prime};\mu\in(2\mathbb{N}_{0}+1)\times(2\mathbb{N}_{0}+1)\right\}$ , which can be obtained from the iPAD3 and iPAD6 bases. The error matrix of the unbiased estimator becomes

[TABLE]

which remains on the same order of magnitude as the variance of the simplified estimator in Eq. (112), while the bias contribution is no longer present. The number of bases needed to achieve enhanced and exactly unbiased multi-moment estimation for other PSFs and dimensions remains an open question.

VI Numerical demonstration

I now present Monte Carlo simulations to corroborate the theory. Assume $D=1$ . Each simulated object is an ensemble of $S=5$ point sources with randomly generated positions $\{X_{\sigma};\sigma=1,\dots,S\}$ within the interval

[TABLE]

such that

[TABLE]

$50$ objects are generated for each PSF under study. For direct imaging, I assume that the mean photon number is $N=50,000$ , the pixel size is $dx=0.1$ , and $1,000$ samples of Poisson images are generated for each object. The estimator described in Appendix C is applied to each sample to estimate the moments $\theta_{\mu}$ for $\mu=1,2,3,4$ ( $\theta_{0}$ can be estimated by summing all the photon counts and the results are trivial). The sample errors with respect to the true parameters are averaged to approximate the expected values. The averaged errors are then plotted for two different PSFs in Figs. 5 and 6 and compared with the CRB given by Eq. (47), omitting the $O(\Delta)$ correction.

To simulate SPADE according to Sec. IV, measurements in three different bases are simulated. The first basis is

[TABLE]

with the simulated photon counts denoted by $\{n_{0},n_{1},n_{2}\}$ , the second basis is

[TABLE]

with the photon counts denoted by $\{n_{01}^{+},n_{01}^{-},n_{2}^{\prime}\}$ , and the third basis is

[TABLE]

with the photon counts denoted by $\{n_{0}^{\prime},n_{12}^{+},n_{12}^{-}\}$ . The light is split equally among the three measurements, such that $N_{s}=N/3$ . All photons in higher-order modes are neglected.

To estimate the moments with SPADE, I use the simplified but biased estimator given by Eq. (101), with $q$ given by Eq. (108). Using Eq. (75) for $\check{\Gamma}_{01}$ , the estimator of $\theta_{1}$ becomes

[TABLE]

The estimator is applied to $1,000$ samples of the simulated photon counts for each object. The sample errors with respect to the true parameters are averaged and compared with the analytic expression

[TABLE]

which neglects the bias and applies the approximations

[TABLE]

to Eqs. (76) and (85). Similarly,

[TABLE]

To estimate $\theta_{4}$ , I use both of the photon counts that come from the two $\phi_{2}(x)$ projections to obtain

[TABLE]

There is no need to specify $\theta_{0}$ , $\tau$ , or $\tau_{s}$ individually if the errors are normalized with respect to $\theta_{0}^{2}$ . The simulated errors and the analytic expressions are plotted in Figs. 5–7 against the relevant parameters in log-log scale for the three PSFs. The three PSFs in the spatial-frequency domain under study and the associated PAD modes are plotted in Fig. 8.

Figure 5 plots the results for the Gaussian PSF described in Sec. V. The simulated errors all match the theory, despite the approximations in the analytic expressions. In particular, the agreement confirms that the contribution of bias to the errors of SPADE is negligible. For $\mu=1$ , SPADE uses one third of the photons only, and its errors are three times those of direct imaging. For higher moments, however, SPADE outperforms direct imaging by orders of magnitude.

It is important to note that the plotted mean-square errors are normalized with respect to $\theta_{0}^{2}(\Delta/2)^{2\mu}$ , which is the square of the prior limit given by Eq. (42), and only the normalized errors for $\mu=1,2$ go significantly below $1$ . According to the discussion in Sec. IV.4, this implies that only the estimation for $\mu\leq 2$ is informative, while the estimation for $\mu\geq 3$ would require a lot more photons to become informative. The high variances of the estimators for $\mu\geq 3$ also suggest that, for the given photon number, replacing them with Bayesian estimators Van Trees (2001); Van Trees and Bell (2007); Berger (1985) can reduce their errors to the vicinity of the prior levels given by Eq. (118), although the bias will go up a lot.

The second PSF under study is the “bump” aperture function Debnath and Shah (2015)

[TABLE]

where $\Psi(0)\approx 1.0084$ is a normalization constant. The compact support models a hard bandwidth limit, while the infinite differentiability of $\Psi(k)$ ensures that all the moments of $|\psi(x)|^{2}$ are finite and the direct-imaging theory in Sec. III is valid, as discussed in Appendix H. The simulated errors, plotted in Fig. 6, behave similarly to those in the Gaussian case, except that the direct-imaging errors are substantially higher for higher moments. The enhancements by SPADE appear even bigger, though not big enough to bring the errors for $\mu\geq 3$ down to the informative regime for the given photon number.

The final PSF is the textbook rectangle aperture function

[TABLE]

The second and higher moments of $|\psi(x)|^{2}$ are infinite, meaning that the direct-imaging theory in Sec. III is inapplicable, as discussed in Appendix H. Fortunately, the orthogonal polynomials with respect to $|\Psi(k)|^{2}$ and therefore the PAD basis remain well-defined Rehacek et al. (2017a). Figure 7 plots the results for SPADE, which are similar to those for the bump aperture in Fig. 6. Although these results have no direct-imaging limits to compare with, the earlier results on the two-point problem for this PSF Tsang et al. (2016a); Paúr et al. (2016); Rehacek et al. (2017a) suggest that significant improvements remain likely.

VII Conclusion

The semiclassical treatment complements the quantum approach in Ref. Tsang (2017) by offering a shortcut to the Poisson photon-counting model for incoherent sources, passive linear optics, and photon counting. Besides pedagogy, this work generalizes the results in Refs. Tsang et al. (2016a); Nair and Tsang (2016a); Tsang et al. (2016b); Nair and Tsang (2016b); Lupo and Pirandola (2016); Tsang (2018); Ang et al. (2017); Tsang (2017); Krovi et al. (2016); Lu et al. (2016); Tham et al. (2017); Tang et al. (2016); Yang et al. (2016); Paúr et al. (2016); Rehacek et al. (2017a); Yang et al. (2017); Kerviche et al. (2017); Chrostowski et al. (2017); Řehaček et al. (2017); Rehacek et al. (2017b) for more general objects and PSFs in the context of moment estimation, demonstrating that the giant enhancements by SPADE are not limited to the case of two point sources or Gaussian PSF considered in prior works.

Many open problems remain, such as extensions for more general PSFs, more complex objects, and three-dimensional imaging, the effect of excess statistical and systematic errors, such as dark counts, aberrations, turbulence, and nonparaxial effects Mortensen et al. (2010), the application of more advanced Bayesian or minimax statistics Van Trees (2001); Van Trees and Bell (2007); Raginsky et al. (2010); Donoho et al. (1992); Candès and Fernandez-Granda (2014); Schiebinger et al. (2017); Zhu et al. (2012); Bierbaum et al. (2017); Meister (2009), and the quantum optimality of the measurements Tsang et al. (2016a); Nair and Tsang (2016b); Lupo and Pirandola (2016); Tsang (2018); Ang et al. (2017); Tsang (2017); Krovi et al. (2016); Lu et al. (2016); Rehacek et al. (2017a); Chrostowski et al. (2017); Řehaček et al. (2017); Rehacek et al. (2017b). Experimental implementation is another important future direction. For proof-of-concept demonstrations, it should be possible to use the same setups described in Refs. Tang et al. (2016); Yang et al. (2016); Tham et al. (2017); Paúr et al. (2016) to estimate at least the second moments of more general objects. For practical applications in astronomy and fluorescence microscopy, efficient demultiplexing for broadband sources is needed. The technical challenge is by no means trivial, but the experimental progress on spatial-mode demultiplexers has been encouraging Tham et al. (2017); Tang et al. (2016); Paúr et al. (2016); Yang et al. (2016); Morizur et al. (2010); Li et al. (2014); Luo et al. (2014); Mohanty et al. (2017); Martin et al. (2017); Mirhosseini et al. (2013); Zhou et al. (2017), and the promise of giant imaging enhancements using simply far-field linear optics should motivate further efforts.

Acknowledgments

This work is supported by the Singapore Ministry of Education Academic Research Fund Tier 1 Project R-263-000-C06-112.

Appendix A Multi-index notation

A $D$ -dimensional vector of continuous variables is written as

[TABLE]

For such a vector, the following notations are assumed:

[TABLE]

If the subscript is omitted in $\partial$ , derivatives with respect to $x$ are assumed.

A vector of integer indices, on the other hand, is defined as

[TABLE]

For such a vector, the following notations are assumed:

[TABLE]

Note that the one-norm is assumed for index vectors. Other useful notations include

[TABLE]

Appendix B CRB for direct imaging

It is useful to define a Hilbert space

[TABLE]

with respect to

[TABLE]

and the weighted inner product

[TABLE]

where $\operatorname{\overline{span}}$ is the closed linear span inside the $L^{2}(\tilde{f})$ space Dunkl and Xu (2001); Debnath and Shah (2015) and $\tilde{f}(x|\theta)$ is the normalized image. In other words, any function in $\mathcal{H}$ can be expressed as a linear combination of $\{b_{\mu}(x)\}$ . Equation (30) becomes

[TABLE]

This can be inverted with the help of orthogonal polynomials. Define

[TABLE]

where $a_{\mu}(x)$ is a real polynomial with degree $|\mu|$ and the orthonormal condition is

[TABLE]

For orthogonal polynomials to exist, the moment matrix $M$ given by Eq. (32) should be positive-definite Dunkl and Xu (2001), or equivalently

[TABLE]

for any polynomial $\mathcal{P}$ . The strict positiveness can be satisfied as long as the support of $\tilde{f}(x|\theta)$ is an infinite set, as $\mathcal{P}^{2}(x)$ has a finite number of zeros only.

The orthogonal polynomials can be computed by applying the Gram-Schmidt procedure to the set of monomials $\{x^{\mu};\mu\in\mathbb{N}_{0}^{D}\}$ if the set is totally ordered Dunkl and Xu (2001). For $D=1$ , the natural order $\{1,x,x^{2},\dots\}$ leads to a unique set of orthogonal polynomials for a given weight function. For $D\geq 2$ , however, the situation is more complicated. A useful requirement is that the order should respect the degree in the sense of

[TABLE]

An example is the graded lexicographical order, defined by

[TABLE]

For $D=2$ for example, the order is

[TABLE]

but one should see in this example that indices with the same total degree $|\mu|$ may be ordered in other ways and there is no single compelling choice; a different choice will lead to a different set of orthogonal polynomials. In the following I assume simply that a degree-respecting order has been chosen; the analysis is valid regardless of the choice.

Express each polynomial as

[TABLE]

where $A$ is a matrix that satisfies the lower-triangular property

[TABLE]

Combining Eqs. (32), (170), and (175), I obtain

[TABLE]

Given a total order of the indices, the matrices can be rasterized into two-dimensional matrices. Equation (178) can then be written more compactly as

[TABLE]

where $\top$ denotes the matrix transpose and $I$ is the identity matrix. As $M$ is positive-definite, $A$ can be obtained from the Cholesky decomposition

[TABLE]

where $L$ is a real lower-triangular matrix with positive diagonal elements Horn and Johnson (1985). Since the diagonal elements of a triangular matrix are also its eigenvalues, $L$ is invertible, $L^{-1}$ is also lower-triangular, and setting

[TABLE]

leads to

[TABLE]

which satisfies Eq. (178).

To invert Eq. (168), I also need to prove that $a$ is an orthonormal basis in $\mathcal{H}$ . The orthonormality given by Eq. (170) is satisfied by definition, while the completeness follows from the fact that the only function $u(x)=\sum_{\nu}\lambda_{\nu}b_{\nu}(x)$ in $\mathcal{H}$ that is orthogonal to $a$ in the sense of

[TABLE]

is the zero function, provided that

[TABLE]

is an invertible matrix. To prove so, apply integration by parts to Eq. (183) to obtain

[TABLE]

where $C$ is defined by Eq. (33). Since $A$ is invertible, it suffices to prove that $C$ is also invertible. Consider the term $\partial^{\nu}x^{\mu}$ in $C_{\mu\nu}$ . $\nu>\mu$ in a degree-respecting order implies $|\nu|>|\mu|$ , or $|\nu|=|\mu|$ and $\nu\neq\mu$ . In either case, there exists at least one $\nu_{j}>\mu_{j}$ that makes $\partial^{\nu}x^{\mu}$ vanish, resulting in

[TABLE]

meaning that $C$ is lower-triangular. The eigenvalues of $C$ are then the diagonal elements and given by

[TABLE]

Hence $C$ is invertible. Since both $A$ and $C$ are lower-triangular and invertible, $B=AC$ is also lower-triangular and invertible, and

[TABLE]

is lower-triangular as well.

I can now use the $a$ basis to express Eq. (168) as

[TABLE]

In matrix form,

[TABLE]

and the CRB becomes

[TABLE]

where I have applied Eqs. (188) and (181).

Appendix C An unbiased and

efficient estimator for direct imaging

Let $\{n(\mathcal{S});\mathcal{S}\subseteq\mathbb{R}^{D}\}$ be the Poisson process Snyder and Miller (1991); Çınlar (2011) obtained by direct imaging with infinitesimal pixel size. The expected value of $n$ over an area $\mathcal{S}$ is

[TABLE]

and $\{n(\mathcal{S}_{1}),n(\mathcal{S}_{2}),\dots\}$ are independent Poisson variables if $\{\mathcal{S}_{1},\mathcal{S}_{2},\dots\}$ are disjoint subsets. Consider the estimator

[TABLE]

Its expected value is

[TABLE]

where I have applied Eqs. (29) and (33). Its covariance, on the other hand, is

[TABLE]

which coincides with the CRB given by Eq. (31). The estimator is hence unbiased and efficient.

Appendix D CRB for direct imaging in

the diffraction-unlimited regime

Suppose that the PSF $|\psi(x)|^{2}=\delta(x)$ is infinitely sharp and $f(x|\theta)=F(x|\theta)$ . The image moments given by Eq. (32) become identical to those of the object, viz.,

[TABLE]

the $C$ matrix given by Eq. (33) becomes

[TABLE]

and the CRB given by Eq. (31) becomes

[TABLE]

This represents an ideal scenario where the imaging is limited only by shot noise and not by diffraction. Equation (203) also serves as a general lower bound on the CRB given by Eq. (19) for any linear-optical processing, as Eq. (6) is a Markov chain on $F(X|\theta)$ and the data-processing inequality Zamir (1998) can be invoked.

To verify Eq. (203), suppose that $F$ consists of isolated point sources, viz.,

[TABLE]

and since $|\psi(x)|^{2}=\delta(x)$ , their positions can be perfectly resolved. The unknowns are then $\vartheta$ , and the CRB with respect to $\vartheta$ is

[TABLE]

Expressing the moments as

[TABLE]

I can compute the CRB with respect to the moments via the transformation

[TABLE]

which coincides with Eq. (203).

Appendix E Properties of matrices in Sec. IV

Equation (58) can be inverted to give

[TABLE]

Substituting this in Eq. (81) and using the orthonormality given by Eq. (61), I obtain

[TABLE]

The inverse is given by Eq. (83), which can be confirmed by directly computing $HH^{-1}$ or $H^{-1}H$ . Since $G^{-1}$ and $G$ are lower-triangular, $H$ and $H^{-1}$ are upper-triangular.

If $|\Psi(k)|^{2}$ is centrosymmetric according to Eq. (74), Ref. Dunkl and Xu (2001) shows that $g_{q}(k)$ consists of only even-order monomials $\{k^{r};|r|\ \textrm{even}\}$ if $|q|$ is even and only odd-order monomials $\{k^{r};|r|\ \textrm{odd}\}$ if $|q|$ is odd. Thus

[TABLE]

Substituting $k$ with $-k$ in the integral in Eq. (78) yields

[TABLE]

and $h_{q}(X)$ is real. It follows that $H$ and $H^{-1}$ are real as well.

Appendix F CRB for direct imaging with the

Gaussian PSF

In the limit of $\Delta\to 0$ ,

[TABLE]

A set of orthogonal polynomials are

[TABLE]

where

[TABLE]

and the definition of the single-variable Hermite polynomials can be found, for example, in Refs. DLMF ; Olver et al. (2010). The $B$ matrix defined by Eq. (183) can then be computed by substituting the identity

[TABLE]

for Hermite polynomials DLMF ; Olver et al. (2010) and using the orthonormality of $a$ . The result is

[TABLE]

which can be substituted into Eq. (191) to give Eq. (123).

Appendix G An estimator for SPADE with non-separable

PSFs

The simple estimator given by Eq. (101) relies on the strong upper-triangular property of $H$ given by Eq. (97) for separable PSFs. Without it, the weaker property given by Eq. (84) for a degree-respecting order still implies that the $\sum_{r}$ sum in Eq. (89) can be separated into a $|r|=|q|$ group and and a $|r|>|q|$ group, viz.,

[TABLE]

and Eq. (89) becomes

[TABLE]

If I assume the estimator

[TABLE]

the bias is also given by Eq. (102), while the variance is

[TABLE]

which can still be minimized by choosing $q$ and $q^{\prime}$ according to Eq. (108).

A problem with Eq. (223) is that, for a given $|q|$ and $|q^{\prime}|$ , the number of $(r,r^{\prime})$ indices with $|r|=|q|$ and $|r^{\prime}|=|q^{\prime}|$ is

[TABLE]

so the estimator may require a large number of $\check{\Gamma}_{rr^{\prime}}$ ’s and a large number of bases to implement for a high-order moment, leading to a reduction in $\min(N_{s})$ . This difficulty is compounded by the fact that, for $D\geq 2$ , there exist infinitely many sets of orthogonal polynomials for a given weight function, as pointed out in Appendix B, leading to infinite possible choices of the $g$ polynomials and the PAD basis. For separable PSFs, the choice of the separable PAD basis in Sec. IV.2 fortunately leads to only one term in Eq. (223), but it remains an open question whether Eq. (223) can be further simplified via a more specific choice of the PAD basis for non-separable PSFs.

Appendix H Conditions for finite image moments

Given Eqs. (42) and (45), $M$ is finite if all the PSF moments $\{\Lambda_{\mu};\mu\in\mathbb{N}_{0}^{D}\}$ are finite. Consider

[TABLE]

in terms of the Fourier transform given by Eq. (56). A sufficient condition for $\Lambda$ to be finite is that $\Psi(k)$ is infinitely differentiable and has compact support; an example is the bump function given by Eq. (154).

If any $\Lambda_{\mu}$ is infinite, the $C$ matrix given by Eq. (38) and the CRB given by Eq. (47) also have infinite elements, and the direct-imaging theory in Sec. III and Appendix B breaks down. This happens for the rectangle aperture function given by Eq. (157). A solution, not explored in this work, may be to smooth $\Psi(k)$ by convolving it with a bump function with support width $w$ , such that the smoothed $\Psi(k)$ becomes infinitely differentiable but remains compactly supported. When $w\ll 1$ , the result should offer a good approximation of that for the original $\Psi(k)$ .

Bibliography66

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Tsang et al. (2016 a) Mankei Tsang, Ranjith Nair, and Xiao-Ming Lu, “Quantum theory of superresolution for two incoherent optical point sources,” Physical Review X 6 , 031033 (2016 a) . · doi ↗
2Nair and Tsang (2016 a) Ranjith Nair and Mankei Tsang, “Interferometric superlocalization of two incoherent optical point sources,” Optics Express 24 , 3684–3701 (2016 a) . · doi ↗
3Tsang et al. (2016 b) Mankei Tsang, Ranjith Nair, and Xiao-Ming Lu, “Quantum information for semiclassical optics,” in Proc. SPIE , Quantum and Nonlinear Optics IV, Vol. 10029 (SPIE, Bellingham, WA, 2016) p. 1002903. · doi ↗
4Nair and Tsang (2016 b) Ranjith Nair and Mankei Tsang, “Far-Field Superresolution of Thermal Electromagnetic Sources at the Quantum Limit,” Physical Review Letters 117 , 190801 (2016 b) . · doi ↗
5Lupo and Pirandola (2016) Cosmo Lupo and Stefano Pirandola, “Ultimate Precision Bound of Quantum and Subwavelength Imaging,” Physical Review Letters 117 , 190802 (2016) . · doi ↗
6Tsang (2018) Mankei Tsang, “Conservative classical and quantum resolution limits for incoherent imaging,” Journal of Modern Optics 65 , 104–110 (2018) . · doi ↗
7Ang et al. (2017) Shan Zheng Ang, Ranjith Nair, and Mankei Tsang, “Quantum limit for two-dimensional resolution of two incoherent optical point sources,” Physical Review A 95 , 063847 (2017) . · doi ↗
8Tsang (2017) Mankei Tsang, “Subdiffraction incoherent optical imaging via spatial-mode demultiplexing,” New Journal of Physics 19 , 023054 (2017) . · doi ↗