Optimal Threshold Design for Quanta Image Sensor

Omar A. Elgendy; Stanley H. Chan

arXiv:1704.03886·cs.CV·March 22, 2019

Optimal Threshold Design for Quanta Image Sensor

Omar A. Elgendy, Stanley H. Chan

PDF

TL;DR

This paper develops an optimal threshold design framework for Quanta Image Sensors, showing that spatially varying thresholds improve image quality and proposing a practical bisection-based update scheme.

Contribution

It introduces a theoretical oracle threshold matching pixel intensity and a practical asymptotically unbiased threshold update method.

Findings

01

Improved convergence rate over existing methods

02

Theoretical oracle threshold matches pixel intensity

03

Practical threshold scheme achieves better image reconstruction

Abstract

Quanta Image Sensor (QIS) is a binary imaging device envisioned to be the next generation image sensor after CCD and CMOS. Equipped with a massive number of single photon detectors, the sensor has a threshold $q$ above which the number of arriving photons will trigger a binary response "1", or "0" otherwise. Existing methods in the device literature typically assume that $q = 1$ uniformly. We argue that a spatially varying threshold can significantly improve the signal-to-noise ratio of the reconstructed image. In this paper, we present an optimal threshold design framework. We make two contributions. First, we derive a set of oracle results to theoretically inform the maximally achievable performance. We show that the oracle threshold should match exactly with the underlying pixel intensity. Second, we show that around the oracle threshold there exists a set of thresholds that give…

Figures40

Click any figure to enlarge with its caption.

Tables2

Table 1. TABLE I: List of QIS Prototypes and Parameters

Camera	Canon 5D CMOS	EMCCD [12]	GMAPD [13]	SPC SPAD [14]	SwissSPAD [11]	Fossum QIS [15]
Price	$$ 5, 000$	$$ 20, 000$	Prototype	Prototype	Prototype	Prototype
Resolution	$4096 \times 2160$	$1024 \times 1024$	$256 \times 256$	$320 \times 240$	$512 \times 128$	$1376 \times 768$
Pixel Pitch ( $μ$ m)	$2.3$	$13$	25	$8$	24	$3.6$
Full-well capacity	69 ke- (@ISO100)	180 ke-	-	$56 - 125$ e-	-	$1 - 250$ e-
Frames per second (fps)	6	$26 - 92$	$8 \times 10^{3}$	$2 \times 10^{4}$	$1.56 \times 10^{5}$	$1 \times 10^{3}$
Sensor data rate	$88.6$ Mbps	0.48 Gbps	0.52 Gbps	1.54 Gbps	10.24 Gbps	1 Gbps

Table 2. TABLE II: Average PSNR and Standard deviation of 77 recovered images using different Q-maps and 50 random samples.

Configuration

Average

PSNR

Std

Uniform Threshold

q = 1

10.30

0.01

q = 5

28.80

0.04

q = 10

23.22

0.02

q = 16

12.95

0.01

Conditional Reset [21]

Ascending

q

sequence

23.77

0.52

Descending

q

sequence

24.95

0.53

Proposed Method

2 ​ K^{2} \times 2 ​ K^{2}

30.14

0.06

K^{2} \times K^{2}

31.18

0.06

K \times K

32.78

0.02

Equations102

θ = α G c,

θ = α G c,

G = \frac{1}{K} I_{N \times N} \otimes 1_{K \times 1},

G = \frac{1}{K} I_{N \times N} \otimes 1_{K \times 1},

P (Y_{m, t} = y_{m, t}) = \frac{θ _{m}^{y_{m, t}} e ^{- θ_{m}}}{y _{m, t} !},

P (Y_{m, t} = y_{m, t}) = \frac{θ _{m}^{y_{m, t}} e ^{- θ_{m}}}{y _{m, t} !},

B_{m, t} = {0, 1, \mbox i f Y_{m, t} < q . \mbox i f Y_{m, t} \geq q

B_{m, t} = {0, 1, \mbox i f Y_{m, t} < q . \mbox i f Y_{m, t} \geq q

P (B_{m, t} = b_{m, t}) = ⎩ ⎨ ⎧ k = 0 \sum q - 1 \frac{θ _{m}^{k} e ^{- θ_{m}}}{k !}, k = q \sum \infty \frac{θ _{m}^{k} e ^{- θ_{m}}}{k !}, \mbox i f b_{m, t} = 0, \mbox i f b_{m, t} = 1.

P (B_{m, t} = b_{m, t}) = ⎩ ⎨ ⎧ k = 0 \sum q - 1 \frac{θ _{m}^{k} e ^{- θ_{m}}}{k !}, k = q \sum \infty \frac{θ _{m}^{k} e ^{- θ_{m}}}{k !}, \mbox i f b_{m, t} = 0, \mbox i f b_{m, t} = 1.

Ψ_{q} (θ) = def \frac{1}{Γ ( q )} \int_{θ}^{\infty} t^{q - 1} e^{- t} d t, for θ > 0, q \in N .

Ψ_{q} (θ) = def \frac{1}{Γ ( q )} \int_{θ}^{\infty} t^{q - 1} e^{- t} d t, for θ > 0, q \in N .

Ψ_{q} (θ) = k = 0 \sum q - 1 \frac{θ ^{k}}{k !} e^{- θ} .

Ψ_{q} (θ) = k = 0 \sum q - 1 \frac{θ ^{k}}{k !} e^{- θ} .

P (B_{m, t} = 0)

P (B_{m, t} = 0)

P (B_{m, t} = 1)

P (B_{m, t} = 0)

P (B_{m, t} = 0)

\frac{d}{d θ} Ψ_{q} (θ) = \frac{- θ ^{q - 1} e ^{- θ}}{Γ ( q )} < 0, \forall q \in N, and θ > 0.

\frac{d}{d θ} Ψ_{q} (θ) = \frac{- θ ^{q - 1} e ^{- θ}}{Γ ( q )} < 0, \forall q \in N, and θ > 0.

Θ_{q}

Θ_{q}

Q_{θ}

c

c

\times P [B_{m, t} = 0; θ_{m}]^{1 - b_{m, t}}

\displaystyle\overset{(b)}{=}\mathop{\underset{\boldsymbol{c}}{\mbox{argmax}}}\;\;\sum_{t=0}^{T-1}\sum_{m=0}^{M-1}\Big{\{}b_{m,t}\log(1-\Psi_{q}(\theta_{m}))

\displaystyle\hskip 99.58464pt+(1-b_{m,t})\log\Psi_{q}(\theta_{m})\Big{\}},

B_{n} = def {B_{K n + k, t} ∣ k = 0, \dots, K - 1, t = 0, \dots, T - 1} .

B_{n} = def {B_{K n + k, t} ∣ k = 0, \dots, K - 1, t = 0, \dots, T - 1} .

c_{n} = \frac{K}{α} Ψ_{q}^{- 1} (1 - \frac{S _{n}}{K T}),

c_{n} = \frac{K}{α} Ψ_{q}^{- 1} (1 - \frac{S _{n}}{K T}),

SNR_{q} (c) = def 10 lo g_{10} \frac{c ^{2}}{E [( c - c ) ^{2} ]},

SNR_{q} (c) = def 10 lo g_{10} \frac{c ^{2}}{E [( c - c ) ^{2} ]},

\mathrm{SNR}_{q}(c)\approx 10\log_{10}\big{(}c^{2}I_{q}(c)\big{)}+10\log_{10}KT,

\mathrm{SNR}_{q}(c)\approx 10\log_{10}\big{(}c^{2}I_{q}(c)\big{)}+10\log_{10}KT,

I_{q} (c) = (\frac{α}{K})^{2} \frac{e ^{- 2 (\frac{α c}{K})} ( \frac{α c}{K} ) ^{2 q - 2}}{Γ ^{2} ( q ) Ψ _{q} ( \frac{α c}{K} ) ( 1 - Ψ _{q} ( \frac{α c}{K} ) )} .

I_{q} (c) = (\frac{α}{K})^{2} \frac{e ^{- 2 (\frac{α c}{K})} ( \frac{α c}{K} ) ^{2 q - 2}}{Γ ^{2} ( q ) Ψ _{q} ( \frac{α c}{K} ) ( 1 - Ψ _{q} ( \frac{α c}{K} ) )} .

SNR_{q} (c)

SNR_{q} (c)

q^{*} = q \in N \mbox a r g ma x SNR_{q} (c) = q \in N \mbox a r g ma x lo g (c^{2} I_{q} (c)) .

q^{*} = q \in N \mbox a r g ma x SNR_{q} (c) = q \in N \mbox a r g ma x lo g (c^{2} I_{q} (c)) .

lo g (c^{2} I_{q} (c)) \geq = def L_{q} (c) 2 (lo g 2 - \frac{α c}{K} + q lo g \frac{α c}{K} - lo g Γ (q)) .

lo g (c^{2} I_{q} (c)) \geq = def L_{q} (c) 2 (lo g 2 - \frac{α c}{K} + q lo g \frac{α c}{K} - lo g Γ (q)) .

q^{*} (c) = q \in N \mbox a r g ma x L_{q} (c) = ⌊ \frac{α c}{K} ⌋ + 1,

q^{*} (c) = q \in N \mbox a r g ma x L_{q} (c) = ⌊ \frac{α c}{K} ⌋ + 1,

Ψ_{q} (\frac{α c}{K}) = 1 - \frac{S}{K T},

Ψ_{q} (\frac{α c}{K}) = 1 - \frac{S}{K T},

γ_{q} (c) = def 1 - \frac{S}{K T} .

γ_{q} (c) = def 1 - \frac{S}{K T} .

E [γ_{q} (c)]

E [γ_{q} (c)]

Var [γ_{q} (c)]

γ_{q} (c) = 1 - S / K T \to a . s . 1 - E [B_{k, t}] = Ψ_{q} (α c / K) .

γ_{q} (c) = 1 - S / K T \to a . s . 1 - E [B_{k, t}] = Ψ_{q} (α c / K) .

E [c]

E [c]

\to (b) \frac{K}{α} Ψ_{q}^{- 1} Ψ_{q} (\frac{α c}{K}) = (c) c .

c = \frac{K}{α} Ψ_{q}^{- 1} (γ_{q} (c)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\reserveinserts

28

Optimal Threshold Design for Quanta Image Sensor

Omar A. Elgendy, and Stanley H. Chan The authors are with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. Email: { oelgendy, stanchan}@purdue.edu. The work was supported, in part, by the U.S. National Science Foundation under Grant CCF-1718007. A preliminary version of this paper was presented at the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ.This paper follows the concept of reproducible research. All the results and examples presented in the paper are reproducible using the code and images available online at http://engineering.purdue.edu/ChanGroup/.

Abstract

Quanta Image Sensor (QIS) is a binary imaging device envisioned to be the next generation image sensor after CCD and CMOS. Equipped with a massive number of single photon detectors, the sensor has a threshold $q$ above which the number of arriving photons will trigger a binary response “1”, or “0” otherwise. Existing methods in the device literature typically assume that $q=1$ uniformly. We argue that a spatially varying threshold can significantly improve the signal-to-noise ratio of the reconstructed image. In this paper, we present an optimal threshold design framework. We make two contributions. First, we derive a set of oracle results to theoretically inform the maximally achievable performance. We show that the oracle threshold should match exactly with the underlying pixel intensity. Second, we show that around the oracle threshold there exists a set of thresholds that give asymptotically unbiased reconstructions. The asymptotic unbiasedness has a phase transition behavior which allows us to develop a practical threshold update scheme using a bisection method. Experimentally, the new threshold design method achieves better rate of convergence than existing methods.

Index Terms:

Quanta image sensor, single-photon imaging, high dynamic range, binary quantization, maximum likelihood.

I Introduction

I-A Threshold Design for Quanta Image Sensor

Quanta Image Sensor (QIS) is a class of solid-state image sensors envisioned to be the next generation imaging device after CCD and CMOS. Originally proposed by Eric Fossum in 2005 [1], the sensor has gained significant momentum in the past decade, both in terms of hardware design [2, 3, 4] and image processing [5, 6, 7, 8, 9]. The advantage of QIS over the mainstream CCD and CMOS is attributed to its high spatial resolution (e.g., $10^{9}$ pixels per sensor with $200$ nm pitch per pixel [10]) and high speed (e.g., 100k fps as reported in [11]). However, in order to simplify circuit, minimize power and reduce data transfer, QIS is operated in a binary mode: When the number of photons arriving at the sensor exceeds a threshold $q$ , the sensor generates a binary bit “1”. When the number of photons is less than $q$ , the sensor generates a “0”. The goal of this paper is to address the question of how to optimally choose $q$ to maximize the signal-to-noise ratio of the reconstructed image.

Optimal threshold design for QIS is important as it directly affects the dynamic range of an image. Figure 1 illustrates an example where we simulate the raw binary data acquired by a QIS using a uniform threshold $q$ . When $q$ is low, most of the bits in the raw input are “1”. The reconstructed image is therefore an over-exposed image. On the other hand, when $q$ is high, most of the bits in the raw input are “0”. The reconstructed image is then under-exposed. In both cases, it is evident from the simulation that a uniform threshold has limited performance. A better way is to allow $q$ to vary spatially so that a pixel (or a group of pixels) has its own threshold value. The result in Figure 1(d) shows the reconstruction result using a spatially varying threshold obtained from our proposed technique, which is clearly better than the uniform thresholds.

I-B Scope and Contributions

The goal of this paper is to present an optimal threshold design methodology and provide theoretical justifications. The two major contributions are summarized as follows.

First, we provide a rigorous theoretical analysis of the performance limit of the image reconstruction as a function of the threshold. These results form the basis of our subsequent discussions of the threshold update scheme. Some results are known, e.g., the signal-to-noise ratio is a function of the Fisher Information [16, 17], but a number of new results are shown. In particular, we show that (i) the maximum likelihood estimate has a closed-form expression in terms of the incomplete Gamma function (Section III.B); (ii) the oracle threshold can be derived in closed-form by maximizing the signal-to-noise ratio (Section III.C); (iii) the image reconstruction has a phase transition behavior (Section IV.A - Section IV.D).

Second, we propose an efficient threshold update scheme based on our theoretical results. The new scheme is a bisection method which iteratively updates the threshold without the need of reconstructing the image. By checking whether the proportion of one’s and zero’s approaches 0.5 in a spatial-temporal block, the threshold is guaranteed to be near optimal. Compared to other existing threshold update schemes such as [18] and [19, 20, 21], the new scheme offers significantly faster rate of convergence (Section IV.E). We also demonstrate how the dynamic range can be extended for high dynamic range (HDR) imaging (Section IV.F).

A preliminary version of this paper was presented in ICIP 2016 [8]. This journal version contains significantly more details including complete proofs of major results, more comprehensive comparisons with existing methods, and discussions of HDR imaging.

II Background

II-A Current State of QIS

Quanta Image Sensor (QIS) belongs to the family of photon-counting devices. These photon-counting devices have been known for a long time. Some better-known examples are the electron-multiplying charge-coupled device (EMCCD) [22, 23], single-photon avalanche diode (SPAD) [24, 14, 11], Geiger-mode avalanche photodiode (GMAPD) [13], etc. The common feature of these devices is their single photon sensitivity, which makes them useful in medical imaging [25, 26, 27], astronomy [28], defense [29], nuclear engineering [30], depth and reflectivity reconstruction [31], ultra-fast low-light tracking [32], and recently in quantum random number generation used in cryptography [33, 34].

The concept of QIS was first proposed by Fossum in 2005 as a solution for sub-diffraction limit pixels. The sensor was called the digital film sensor, and later the quanta image sensor [35, 36, 15]. After the introduction of QIS, researchers in EPFL developed a similar concept called the Gigavision camera [37, 38, 6]. Recently, teams at the University of Edingburgh [39, 14, 24] and EPFL [33, 40] have made new progresses in QIS using binary single photon detectors. In the industry, Rambus Inc. (Sunnyvale, CA) has developed binary image sensors for high dynamic range imaging [19, 20, 21]. Table I lists several recent QIS prototypes that are available or are currently being developed. As a comparison we also show a Canon 5D Mark III CMOS camera. Among many different features, the most noticeable is the frame rate. For example, SPS SPAD can be operated at 20k fps. SwissSPAD can even achieve 156k fps. Both are significantly faster than a standard CMOS camera.

II-B Related Work on Threshold Design

Existing work on QIS threshold design study can be summarized into three classes of methods.

•

Markov Chain [18]. The Markov Chain method developed by Hu and Lu [18] is a time-sequential update scheme. A Markov Chain probability is used to control how easy the threshold should be increased or decreased. While the method has provable convergence, the threshold of each single photon detector of the QIS has to be updated sequentially in time. In contrast, our proposed method allows a group of single photon detectors to share the same threshold. As a result, our proposed method has significantly faster rate of convergence.

•

Conditional Reset [19, 20, 21]. The conditional reset method is a hardware solution proposed by Vogelsang and colleagues. The idea is to take a sequence of images with ascending (or descending) thresholds, and digitally integrate the sequence to form an image. The drawback of the method, besides the additional hardware cost of the per-pixel reset transistors, is the limited quality of the reconstructed image. For the same number of frames, our proposed method produces better images.

•

Checkerboard Threshold [16]. This method constructs a checkerboard of thresholds by alternating two threshold values $q_{1}$ and $q_{2}$ . The optimality criterion of $q_{1}$ and $q_{2}$ is based on minimizing the Cramér-Rao lower bound (CRLB) integrated over a range of light intensities, which is essentially an average case result. Our proposed method obtains the optimal threshold for each pixel. This per-pixel optimization has higher reconstruction performance compared to checkerboard threshold.

II-C QIS Imaging Model

In this subsection we provide an overview of the QIS imaging model. The model has been previously discussed in several papers, e.g., [6, 7, 8, 9]. Readers interested in details can refer to these papers for further explanations.

II-C1 Spatial Oversampling

We denote the discrete version of the light intensity as a vector $\boldsymbol{c}=[c_{0},\ldots,c_{N-1}]^{T}$ , where $n=0,\ldots,N-1$ specify the spatial coordinates. We assume that $c_{n}$ is normalized to the range $[0,1]$ for all $n$ so that there is no scaling ambiguity. To model the actual light intensity, we multiply $c_{n}$ by a constant $\alpha$ to yield $\alpha c_{n}$ , where $\alpha>0$ is a fixed scalar constant.

Given the $N$ -dimensional vector $\boldsymbol{c}$ , QIS uses $M\gg N$ tiny pixels called jots to sample $\boldsymbol{c}$ . The ratio $K\overset{\text{def}}{=}M/N$ is known as the spatial oversampling factor. The oversampling process is illustrated in Figure 2, where it first upsamples the vector $\boldsymbol{c}$ by a factor of $K$ , and then filters the output by a lowpass filter $\{g_{k}\}$ . Mathematically, the process can be expressed as

[TABLE]

where $\boldsymbol{\theta}=[\theta_{0},\ldots,\theta_{M-1}]^{T}$ denotes the light intensity sampled at the $M$ jots, and the matrix $\boldsymbol{G}$ is defined as

[TABLE]

where $\mathbf{1}_{K\times 1}$ is a vector of all ones and $\otimes$ denotes the Kronecker product. Note that the choice of $\boldsymbol{G}$ in (2) is the result of simplifying the model by assuming that the lowpass filter is $g_{k}=1/K$ for all $k$ . This assumption is typically reasonable, because on each QIS jot there is a micro-lens to focus the incident light. Although previous papers, e.g., [6, 7], do not make such assumption, in this paper we decide to use a simplified $\boldsymbol{G}$ , for otherwise the theoretical analysis will become very complicated. Nevertheless, in the Supplementary Material we show comparison between a general $\boldsymbol{G}$ and the simplified $\boldsymbol{G}$ . The gap is usually insignificant.

II-C2 Truncated Poisson Process

We assume that the operating speed of QIS is significantly faster than the scene motion. Therefore, for a given scene $\boldsymbol{c}$ (and also $\boldsymbol{\theta}$ ), we are able to acquire a set of $T$ independent measurements. We illustrate this using the $T$ channels in Figure 2.

The oversampled signal $\boldsymbol{\theta}$ generates a sequence of Poisson random variables according to the distribution

[TABLE]

where $m=0,1,\ldots,M-1$ denotes the $m$ -th jot of the QIS and $t=0,1,\ldots,T-1$ denotes the $t$ -th independent measurement in time. Denoting $q\in\mathbb{N}$ as the quantization threshold, the final observed binary measurement $B_{m,t}$ is a truncation of $Y_{m,t}$ :

[TABLE]

The probability mass function of $B_{m,t}$ is given by

[TABLE]

The goal of image reconstruction is to recover the underlying image $\boldsymbol{c}$ from the binary measurements $\mathcal{B}=\{B_{m,t}\;|\;m=0,\ldots,M-1,\mbox{and}\;t=0,\ldots,T-1\}$ . A pictorial illustration of the reconstruction is shown in Figure 3.

II-C3 Properties of Truncated Poisson Processes

The probability mass function of $B_{m,t}$ in (4) is Bernoulli. However, the right hand side of (4) involves infinite sums which are difficult to interpret. To simplify the equations, we consider the upper incomplete Gamma function ${\Psi_{q}:\mathbb{R}^{+}\rightarrow[0,1]}$ defined in [41] as:

[TABLE]

where $\Gamma(q)=(q-1)!$ is the standard Gamma function. The incomplete Gamma function allows us to rewrite the infinite sums in (4) using the following identity [41]:

[TABLE]

Consequently, the probabilities in (4) become

[TABLE]

Example 1.

In the special case of $q=1$ , we obtain:

[TABLE]

which coincides with the results shown in [6] and [7].

The incomplete Gamma function $\Psi_{q}(\theta)$ is a decreasing function of $\theta$ because the first order derivative of $\Psi_{q}(\theta)$ with respect to $\theta$ is negative:

[TABLE]

The limiting behavior of $\Psi_{q}(\theta)$ is important. For a fixed $q$ , the function $\Psi_{q}(\theta)\rightarrow 1$ as $\theta\rightarrow 0$ and $\Psi_{q}(\theta)\rightarrow 0$ as $\theta\rightarrow\infty$ . While $\Psi_{q}^{-1}$ still exists in these situations because $\Psi_{q}$ is monotonically decreasing, for a given $z$ the value $\Psi_{q}^{-1}(z)$ could be numerically very difficult to evaluate. To characterize the sets of $\theta$ and $q$ that $\Psi_{q}$ is (numerically) invertible, we define the $\theta$ -admissible set and the $q$ -admissible set.

Definition 1.

The $\theta$ -admissible set and $q$ -admissible set of the incomplete Gamma function are

[TABLE]

respectively, where $0<\varepsilon<\frac{1}{2}$ is a constant.

More discussions of the incomplete Gamma function can be found in the Supplementary Material.

Remark 1.

In this paper, we assume that QIS is noise-free, i.e., the only source of randomness is the truncated Poisson random variable. In real sensors, there will be readout noise, photo-response non-uniformity caused by conversion gain variation, dark count rate (a.k.a. dark current), optical crosstalk and electronic crosstalk. See [42] for details.

III Optimal Threshold: Theory

III-A Image Reconstruction by MLE

We begin the optimal threshold design by discussing image reconstruction because the optimality of the threshold is measured with respect to the reconstructed image. However, since QIS is a new device, the number of reconstruction methods is limited. A few examples that can be found in the literature are the gradient descent [6], dynamic programming [43], ADMM [7], and Transform-Denoise method [9], and neural network [44]. In this paper, we shall focus on the maximum likelihood estimation (MLE) approach as it provides closed-form expressions.

Given $\mathcal{B}$ , MLE solves the following optimization problem:

[TABLE]

subject to the constraint that $\boldsymbol{\theta}=\alpha\boldsymbol{G}\boldsymbol{c}$ . Here, the right hand side of $(a)$ is the likelihood function of a Bernoulli random variable, and $(b)$ follows from taking the logarithm. With the $\boldsymbol{G}$ defined in (2), we can partition $\mathcal{B}$ into $N$ blocks $\{\mathcal{B}_{1},\ldots,\mathcal{B}_{N}\}$ where each block is

[TABLE]

Then, the pixel $\widehat{c}_{n}$ can be estimated as follows.

Proposition 1 (Closed-form ML Estimate).

The solution of the MLE in (9) is

[TABLE]

where $S_{n}\overset{\text{def}}{=}\sum_{t=0}^{T-1}\sum_{k=0}^{K-1}B_{Kn+k,t}$ is the sum of bits in the $n$ -th block $\mathcal{B}_{n}$ .

Proof.

See [9]. ∎

III-B Signal-to-Noise Ratio of ML Estimate

In order to determine the optimal threshold, we need to quantify the performance of the ML estimate. The performance metric we use is the signal-to-noise ratio of the ML estimate at every pixel $\widehat{c}_{n}$ . Considering each $\widehat{c}_{n}$ individually is allowed here because they are independently determined according to (10). For notation simplicity we drop the subscript $n$ in the subsequent discussions.

Definition 2.

The signal-to-noise ratio (SNR) of the ML estimate $\widehat{c}$ is defined as

[TABLE]

where the expectation is taken over the probability mass function of the binary measurements in (6).

The difficulty of working with $\mathrm{SNR}_{q}(c)$ is that it does not have a simple closed-form expression. In view of this, Lu [17] showed that the SNR is asymptotically linear to the log of the Fisher Information.

Proposition 2.

As $KT\rightarrow\infty$ ,

[TABLE]

where $I_{q}(c)\overset{\text{def}}{=}\mathbb{E}_{B}\left[\frac{-\partial^{2}}{\partial c^{2}}\mathrm{log}\;\mathbb{P}(B=b;\theta)\right]$ is the Fisher Information measuring the amount of information that the random variable $B$ carries about the unknown value $c$ .

Proof.

See [17]. ∎

While the asymptotic result shown in Proposition 2 has significantly simplified the SNR, we still need to determine the Fisher Information. The following proposition gives a new result of the Fisher Information with arbitrary $q$ .

Proposition 3.

The Fisher Information $I_{q}(c)$ of the probability mass function in (6) under a threshold $q$ is:

[TABLE]

Proof.

See Appendix A-A. ∎

Substituting (13) into (12), we observe that the SNR can be approximated as

[TABLE]

which is characterized by the unknown pixel value $c$ , the threshold $q$ , the spatial oversampling ratio $K$ and the number of temporal measurements $T$ . To understand the behavior of (14), we show in Figure 4 $\mathrm{SNR}_{q}(c)$ as a function of $c$ for different thresholds $q\in\{1,\ldots,16\}$ . For a fixed $q$ , $\mathrm{SNR}_{q}(c)$ is a convex function with a unique maximum. The goal of optimal threshold design is to determine a $q$ which maximizes $\mathrm{SNR}_{q}(c)$ for a fixed $c$ .

Remark 2.

The $\mathrm{SNR}_{q}(c)$ in (14) can also be derived from a concept in the device literature called the exposure-referred SNR [45]. See Supplementary Material for discussions.

III-C Oracle Threshold

We now discuss the optimal threshold design in the oracle setting. We call the result oracle because the optimal threshold depends on the unknown pixel intensity $c$ . The practical threshold design scheme will be discussed in Section IV.

Using the definition of the signal-to-noise ratio, the optimal threshold is determined by maximizing $\mathrm{SNR}_{q}(c)$ with respect to $q$ :

[TABLE]

The second equality follows from Proposition 2. Substituting (13) yields an expression of the right hand side of (15). To further simplify the expression we derive the following lower bound.

Proposition 4.

The function $\log(c^{2}I_{q}(c))$ is lower bounded as follows.

[TABLE]

Proof.

See Appendix A-B. ∎

Using this lower bound, we can derive the optimal threshold $q$ as follows 111Straightly speaking, the result shown in Proposition 5 is a “near-optimal” result because we are minimizing the lower bound. From our experience, the gap between the near-optimality and the exact optimality is typically insignificant..

Proposition 5.

The optimal threshold $q^{*}(c)$ is

[TABLE]

where $\lfloor\cdot\rfloor$ denotes the flooring operator that returns the largest integer smaller than or equal to the argument.

Proof.

See Appendix A-C. ∎

The result of Proposition 5 is important as it states that the oracle threshold is exactly the same as the light intensity $\alpha c/K$ . The flooring operation and the addition of a constant 1 are not crucial here because they are only used to ensure that $q$ is an integer. In [18], a special where $\alpha=1$ was demonstrated experimentally. Proposition 5 now provides a theoretical justification.

IV Optimal Threshold: Practice

The oracle threshold derived in the previous section provides a theoretical foundation but is practically infeasible as it requires knowledge of the ground truth $c$ . In this section, we present an alternative solution by relaxing the optimality criteria. Our strategy is to consider a set of thresholds which are close to the oracle threshold $q^{*}(c)$ , and show that they are asymptotically unbiased when the number of observed bits approaches infinity (Section IV.A). This result will allow us to characterize the estimate $\widehat{c}$ (Section IV.B). We will then show that there exists a phase transition region where the asymptotic unbiasedness is maintained as $q$ stays within a certain range around $q^{*}(c)$ , and is lost rapidly as $q$ falls outside this range (Section IV.C - IV.D). Based on these observations, we will present a practical threshold update scheme (Section IV.E).

IV-A Asymptotic Unbiasedness

In order to derive an alternative threshold that does not require the ground truth, we start by reconsidering the ML estimate $\widehat{c}$ in Proposition 1. For a spatial-temporal block $\mathcal{B}=\{B_{k,t}\,|\,0\leq k<K-1,0\leq t<T-1\}$ , the ML estimate $\widehat{c}$ satisfies the condition

[TABLE]

where $S=\sum_{k,t}B_{k,t}$ is the sum of bits in $\mathcal{B}$ . The right hand side of this equation is an important quantity. We denote it as

[TABLE]

In the device literature (e.g., [45]), the term $1-\gamma_{q}(c)$ is known as the bit-density as it is the proportion of ones in $\mathcal{B}$ . Note that $\gamma_{q}(c)$ is a random variable because $S$ is the sum of $KT$ i.i.d. random binary bits. Therefore, if we want to understand (17), we must first derive the the mean and variance of $\gamma_{q}(c)$ .

Proposition 6.

The mean and variance of $\gamma_{q}(c)$ are

[TABLE]

respectively.

Proof.

See Appendix A-D. ∎

We can now look at the asymptotic behavior of $\gamma_{q}(c)$ to see if it offers any insight about the optimal threshold. Applying the strong law of large number to $S/KT$ , we can show that as $KT\rightarrow\infty$ ,

[TABLE]

Going back to (17)-(18), the ML estimate $\widehat{c}$ should have the expectation:

[TABLE]

where (a) follows from the definition of $\widehat{c}$ , (b) follows from (20), and (c) holds because $\Psi_{q}$ and $\Psi_{q}^{-1}$ cancels each other.

What is the implication of (21)? It shows that the ML estimate $\widehat{c}$ is asymptotically unbiased. That is, as the number of independent measurements grows, the estimate $\widehat{c}$ approaches to the ground truth $c$ . In other words, as long as $KT$ is large enough, the random variable $\widehat{c}$ would be an accurate estimate of the ground truth. How can this be used to determine the threshold $q$ ? Let us look at $\mathcal{Q}_{\theta}$ .

IV-B Set of Admissible Thresholds $\mathcal{Q}_{\theta}$

The result in (17)-(21) shows that for a given $S$ (or equivalently $\gamma_{q}(c)$ ), the ML estimate can be found by

[TABLE]

When this happens, the $\widehat{c}$ given by (22) is asymptotically unbiased. However, the inversion $\Psi_{q}^{-1}$ is not always allowed. There is a set of $q$ ’s that can make $\Psi_{q}$ invertible, which is defined as $\mathcal{Q}_{\theta}$ in Definition 1. The following proposition relates $\mathcal{Q}_{\theta}$ to $\gamma_{q}(c)$ .

Proposition 7.

Let $0<\delta<1$ be a constant. Then, for any

[TABLE]

the random variable $\gamma_{q}(c)$ will not attain 0 or 1 with probability at least $1-\delta$ , i.e.,

[TABLE]

In this case, the ML estimate $\widehat{c}$ is uniquely defined by (22).

Proof.

See Appendix A-E. ∎

Before we proceed, let us look at some rough magnitude of the parameters in the following example.

Example 2.

Let the ground truth pixel value be $c=0.5$ . The sensor parameters are set as $T=50$ , $K=4$ , $\alpha=300$ . For a constant $\delta=2\times 10^{-4}$ , the tolerance level is $\varepsilon=1-(\delta/2)^{1/KT}=0.045$ . Therefore, as long as $q\in\{q\,|\,0.045\leq\Psi_{q}(\theta)\leq 1-0.045\}$ , which is the set $\{q\;|\;28\leq q\leq 48\}$ , the probability that $\gamma_{q}(c)$ equals to 0 or 1 is upper bounded by $\delta=2\times 10^{-4}$ .

IV-C Gap between $\mathcal{Q}_{\theta}$ and $q^{}$*

The result in the previous subsection shows that as long as $q\in\mathcal{Q}_{\theta}$ , the ML estimate is asymptotic unbiased. However, how is a $q\in\mathcal{Q}_{\theta}$ compared to the oracle threshold $q^{*}$ ? We answer this question in three parts.

First, does an asymptotically unbiased estimate maximize the SNR? The answer is no, because Proposition 5 states that if $q^{*}$ is the optimal threshold, then $\mathrm{SNR}_{q^{*}}(c)\geq\mathrm{SNR}_{q}(c)$ for any $q\not=q^{*}$ . Therefore, moving from the exact optimal $q^{*}$ to an asymptotically unbiased threshold is a relaxation of the optimality criteria.

If asymptotic unbiasedness is a relaxed optimality criteria, how much SNR drop will there be if we choose a $q\in\mathcal{Q}_{\theta}$ but not necessarily $q=q^{*}$ ? We show in Figure 5 the plot of a typical experiment with setup discussed in Example 2. As shown in the figure, the green zone is the set $\mathcal{Q}_{\theta}=\{q\;|\;28\leq q\leq 48\}$ , or equivalently $\mathcal{Q}_{\theta}=\{q\,|\,0.045\leq\Psi_{q}(\theta)\leq 0.9955\}$ . For any $q$ in this $\mathcal{Q}_{\theta}$ , the reconstruction has a SNR at least 30dB. If we further tighten $\mathcal{Q}_{\theta}$ so that $\mathcal{Q}_{\theta}=\{q\;|\;35\leq q\leq 42\}$ , or equivalently $\mathcal{Q}_{\theta}=\{q\,|\,0.25\leq\Psi_{q}(\theta)\leq 0.6\}$ , the SNR stays in the range $36.15\mathrm{dB}\leq\mathrm{SNR}_{q}(c)\leq 36.65\mathrm{dB}$ , which is reasonably narrow.

How tight should $\mathcal{Q}_{\theta}$ be? Ideally we want $\mathcal{Q}_{\theta}$ to be as tight as possible. But knowing the fact that the incomplete Gamma function has a rapid transition (See the black line in Figure 5), $\mathcal{Q}_{\theta}$ can be much wider. In fact, we can choose $\mathcal{Q}_{\theta}$ such that $1-\gamma_{q}(c)$ stays close to 0.5, so that we are guaranteed to obtain a near optimal threshold. From an information theoretic point of view, $1-\gamma_{q}(c)\approx 0.5$ is where the bit density attains the maximum information — if $q$ is too high then most bits become 0 whereas if $q$ is too low then most bits become 1. It is maximum when $q$ leads to 50% zeros and 50% ones. 222The exact optimal value of $1-\gamma_{q}(c)$ at $q^{*}$ is slightly lower than 0.5 due to the nonlinearity of the Gamma function. See Supplementary Material for additional discussion.

IV-D Phase Transition Phenomenon

We can now point out a very interesting phenomenon in Figure 5. In the upper plot of Figure 5 we show two sets of curves: blue curves (solid and dotted), and black curves (solid and dotted). The black curves represent the ratio $\mathbb{E}[\,\widehat{c}\,]/c$ , and the black curves represent the average bit density $1-\mathbb{E}[\gamma_{q}(c)]$ . For both sets of curves, we use dotted lines to illustrate the Monte-Carlo simulation using 10,000 random samples, where each sample refers to a spatial-temporal block $\mathcal{B}_{n}$ containing $KT=200$ binary bits. Notice that these dotted lines overlap exactly with their expectations, and hence (17)-(21) are valid.

Let us take a closer look at the blue curve $\mathbb{E}[\,\widehat{c}\,]/c$ . Let $\mathcal{Q}_{\theta}=\{q\;|\;q_{L}\leq q\leq q_{H}\}$ , where $q_{L}$ and $q_{H}$ are the smallest and the largest integers in $\mathcal{Q}_{\theta}$ respectively. There are three distinct phases:

$\bullet$

When $q<q_{L}$ , the threshold is low and so most bits become 1. Therefore, $\gamma_{c}(q)\rightarrow 0$ and hence $\widehat{c}\rightarrow\infty$ . Thus, $\mathbb{E}[\,\widehat{c}\,]/c\rightarrow\infty$ as $q$ decreases.

$\bullet$

When $q>q_{H}$ , the threshold high and so most bits become 0. Therefore, $\gamma_{c}(q)\rightarrow 1$ and hence $\widehat{c}\rightarrow 0$ . Thus, $\mathbb{E}[\,\widehat{c}\,]/c\rightarrow 0$ as $q$ increases.

$\bullet$

When $q_{L}\leq q\leq q_{H}$ , the ML estimate $\widehat{c}$ is asymptotically unbiased. Therefore, $\mathbb{E}[\,\widehat{c}\,]/c=1$ .

Essentially, Figure 5 demonstrates a phase transition behavior of the threshold. Such phase transition exists because $\Psi_{q}$ is only invertible when $q\in\mathcal{Q}_{\theta}$ .

IV-E Bisection Threshold Update Scheme

Now we present a practical threshold update scheme. As we discussed in Section IV.C, the oracle threshold $q^{*}$ can be obtained when bit density $\gamma_{q}(c)$ is close to 0.5. Therefore, a practical procedure to determine $q$ is to sweep through a range of $q$ until the bit density reaches 0.5. To achieve this objective, we propose a bisection method illustrated in Figure 6 and Algorithm 1. Starting with initial thresholds $q_{A}$ and $q_{B}$ , we check whether the bit density satisfies $1-\gamma_{q_{A}}>0.5$ and $1-\gamma_{q_{B}}<0.5$ . If this is the case, then we find a mid point $q_{M}=(q_{A}+q_{B})/2$ and check whether $1-\gamma_{q_{M}}$ is greater or less than 0.5. If $1-\gamma_{q_{M}}>0.5$ , we replace $q_{A}$ by $q_{M}$ , otherwise we replace $q_{B}$ by $q_{M}$ . The process repeats until $1-\gamma_{q_{M}}$ is sufficiently close to 0.5.

In our proposed threshold update scheme, we assume that the image has been partitioned into $N$ blocks $\{\mathcal{B}_{n}\;|\;n=0,\ldots,N-1\}$ . Each $\mathcal{B}_{n}$ contains $KT$ binary bits and is used to estimate one pixel value $c_{n}$ . This setting results in $N$ different thresholds, one for every pixel. To generalize the setting, it is also possible to allow multiple pixels to share a common threshold. Figure 7 shows an example. The advantage of sharing a threshold for multiple pixels is that circuits associated with the sensor can be simplified. In terms of performance, since neighboring pixels are typically correlated, sharing the threshold causes little drop in the resulting SNR.

The price that the proposed bisection algorithm has to pay is the number of frames it requires to determine a good $q$ . For every evaluation of $\gamma_{q_{M}}$ , the sensor has to physically acquire one frame and compute the bit density in each of the $N$ blocks. Therefore, the more bisection steps we need, the more frames that the sensor has to physically acquire. The rate of convergence of the proposed method and existing methods will be compared in Section V.

IV-F Extension to High Dynamic Range

While QIS is a photon counting device, it is designed to count a few photons to keep the full-well capacity small, e.g. 20 photoelectrons as reported in [46]. Therefore, for practical imaging tasks, we need to extend the dynamic range for QIS.

There are two ways to enable dynamic range extension:

•

Bright Scenes: Reduce Duty Cycle. In the signal processing block diagram shown in Figure 2, we can replace the constant $\alpha$ by a fraction as $\alpha\tau$ , where $0\leq\tau\leq 1$ determines the ratio between the actual integration time and the readout scan time. It can also be referred to the shutter duty cycle because the shutter is opened to collect photons during this proportion of time [47]. For very bright scenes, a low duty cycle will prevent QIS from saturating early.

•

Dark Scenes: Multiple Measurements. For dark scenes, multiple measurements can be taken to ensure enough photons over the measurement period. This, however, is different from conventional HDR imaging. In conventional HDR imaging, the multiple shots are taken at different shutter speeds, e.g., 1/8192, 1/2048, 1/512, 1/128, 1/32, 1/8, 1/2 seconds [48], which is redundant. QIS’s multiple shot functions more similar to burst photography [49]. The amount of acquisition time is significantly less than the conventional HDR imaging.

These two methods can be used for any threshold scheme, including ours and others. The benefit of using our proposed threshold scheme is that it supports a much wider dynamic range extension. In Figure 8, we illustrate the total dynamic range that can be covered using 4 multiple measurements at duty cycles $\tau=1$ , $\tau=0.2$ , $\tau=0.04$ , and $\tau=0.008$ . The maximum threshold level is $q_{\max}=25$ , and the minimum threshold level is $q_{\min}=1$ . It can be seen from the figure that with the optimal threshold $q^{*}$ , the dynamic range is significantly more than the non-optimal ones. In particular, we observe a 16dB and a 54dB improvement compared to $q_{\min}=1$ and $q_{\max}=25$ , respectively. Experimental results will be shown in Section V.C.

IV-G Hardware Consideration

Concerning the hardware implementation, we anticipate that future QIS will be equipped with per-pixel FPGAs to perform the proposed threshold update scheme. On-sensor FPGA is an actively developing technology. For example, MIT Lincoln Lab’s digital focal plane array can achieve on-sensor image stabilization and edge detection [50] . For QIS threshold update, the complexity is low because we are only counting the number of ones in the bisection. More specifically, in order to perform the bisection, we only need $K$ additions to compute $\sum_{k=0}^{K-1}b_{Kn+k,t}$ ; one comparison $\sum_{k=0}^{K-1}b_{Kn+k,t}\geq 0.5$ ; one addition and one multiplication (with a constant 0.5) to update the threshold $q_{M}=\lceil(q_{A}+q_{B})/2\rceil$ . The dominating factor here is the $K$ additions, which can be implemented efficiently by shifting bits in a buffer.

We should also point out that the proposed bisection method can be flexibly adjusted spatially and temporally for different hardware configurations. For example, we can use a spatial-temporal window $4\times 4\times 1$ for low-resolution high-speed imaging, or $1\times 1\times 16$ for high-resolution low-speed imaging. This flexibility offers additional advantages of QIS over conventional CCD and CMOS cameras.

V Experimental Results

In this section we evaluate the proposed threshold update scheme by comparing it with existing methods. We consider two evaluation metrics: (1) convergence rate of the threshold update methods; (2) quality of the reconstructed images. For reconstruction evaluation, we create our own Purdue dataset comprising 77 images captured by a Canon EOS Rebel T6i camera. For HDR imaging, we use the HDR-Eye dataset by Nemoto et al. [51, 52]. In all experiments, we fix the spatial over-sampling factor as $K=4\times 4=16$ , and number of temporal frames as $T=13$ . The maximum threshold level is set as $q_{\max}=16$ to ensure that it is realistic for today’s QIS.

V-A Convergence

We compare the proposed threshold update scheme with the Markov Chain (MC) adaptation proposed by Hu and Lu [18]. The Markov Chain adaptation models the threshold as a variable with $2^{L}$ states. These $2^{L}$ states can be regarded as $2^{L}$ steps before reaching to the next threshold level. The probability of changing from one state to another is controlled by a parameter $1-\beta$ with $0<\beta<1$ . When a bit arrives, the state will be updated (increased or decreased) or will remain unchanged. Once the state is increased by $2^{L}$ times, the threshold will be increased by one.

When comparing Markov Chain adaptation with the proposed bisection algorithm, one should be aware of the difference between the two methods. Markov Chain adaptation is a per-jot update scheme whereas the proposed bisection algorithm is a per-pixel update scheme. For a pixel with $K\times K$ jots, Markov Chain adaptation needs $K^{2}$ iterations to update the threshold sequentially. In contrast, the proposed bisection algorithm updates a common threshold for all $K^{2}$ jots simultaneously. Thus in practice our bisection algorithm is significantly less complex to implement in hardware than the Markov Chain. In order to take the different forms of updates into account, we treat the $K^{2}$ iterations of Markov Chain adaptation as one “major iteration” and compare it with the one bisection step of the proposed algorithm.

The first comparison we make is to check the threshold at different jots. Figure 9 shows the results of three typical runs with underlying optimal thresholds $q^{*}=1,8,16$ . In this experiment, we generate 100 random binary blocks of size $K\times K$ and estimate the threshold at each major iteration. We report the average of these 100 estimates to minimize the randomness of the data. The results show that one iteration of the proposed bisection algorithm works as good as the $K^{2}$ iterations of the Markov Chain adaptation. In some cases, Markov Chain tends to oscillate whereas the bisection result is stable.

The second comparison we make is to check how close the estimated threshold is compared to the optimal threshold. The optimal threshold $q^{*}$ is obtained using the oracle scheme. In Figure 10, we plot the mean squared error between the estimated threshold and the oracle threshold. For fairness, we show the results of the MSE averaged over the 77 images of our dataset, and 50 random samples per image. One threshold is shared by $K\times K$ jots, and each $K\times K$ jots correspond to one pixel. The result is consistent with the ones shown in Figure 9.

V-B Image Reconstruction Quality

The convergence comparison in the previous subsection is only useful to compare threshold update methods that actually return a threshold. In the QIS literature, there are methods that implicitly update the threshold, e.g., the conditional reset method [21]. For comparison with these methods, we have to compare the quality of the image reconstructed from the binary raw data. The image reconstruction is done using the closed-form ML estimate in Section III-A.

We consider three classes of methods:

•

Uniform Threshold. Uniform threshold is commonly used in the device literature [5, 6, 7]. A uniform threshold is a single threshold applied to all pixels in the image. In this experiment, we consider the following choices of uniform thresholds: $q=1$ , $q=5$ , $q=10$ and $q=16$ .

•

Conditional Reset [21]. Conditional reset counts the number of photons and is reset when it is above the threshold. The threshold in conditional reset is sequentially increasing or decreasing. The reconstructed image is obtained by digitally integrating the raw binary frames.

•

Proposed Method. As we discussed in Section IV-E, the proposed method can be implemented to let multiple pixels share a common threshold. Thus, in this experiment we consider three sharing strategies: (1) Share a threshold between a neighborhood of $K\times K$ jots (i.e., one threshold for one pixel); (2) Share a threshold between a neighborhood of $K^{2}\times K^{2}$ jots (i.e., one threshold for $K\times K$ pixel); (3) Share a threshold between a neighborhood of $2K^{2}\times 2K^{2}$ jots (i.e., one threshold for $2K\times 2K$ pixels).

The result of the experiment is shown in Table II. The PSNR values reported are averaged over 77 images in our dataset. Each image generates 50 random realizations, and the PSNR of an image is averaged over these 50 random realizations to minimize the randomness. As shown in the table, while conditional reset generally performs better than a uniform threshold, it performs significantly worse than the proposed threshold update scheme.

V-C Influence of QIS Threshold on HDR Imaging

Since QIS does not have sufficient full well capacity to accumulate photons for HDR imaging, we apply the dynamic range extension method discussed in Section IV-F. When different threshold schemes are used, the reconstructed HDR images will be affected. The objective of this experiment is to evaluate the influence of the threshold in HDR imaging.

In this experiment, we consider the HDR-Eye image dataset [51, 52]. Each HDR image in this dataset contains 9 images acquired at different exposure settings ( $-2.7$ , $-2$ , $-1.3$ , $-0.7$ , [math], $0.7$ , $1.3$ , $2$ , and $2.7$ EV). A snapshot of these images are shown in Figure 11. From each exposure, we simulate the photon counts resulting from the luminance channel. The sensor gain is set as $\alpha=K^{2}(q_{\max}-1)$ to ensure proper number of photons, where $K=4\times 4=16$ and $q_{\max}=16$ . On the reconstruction side, we reconstruct the 9 images using the MLE discussed in Section III-A. Tone mapping and exposure fusion [53] are applied to the 9 imags to generate an HDR image. As a reference, we apply the same tone mapping and fusion algorithm to the 9 ground truth images. PSNR between the reference and the estimated is then recorded.

The result of this experiment is shown in Figure 12. With the proposed threshold update scheme, the reconstructed images achieve the highest PSNR value and visual quality. When $q=1$ , which is too low, the image appears under-exposed. When $q=16$ , which is too high, the image appears over-exposed. The spatially varying property of the proposed method mitigates the issue by allowing multiple thresholds.

In practice, one would typically add image denoisers to handle the randomness in the ML estimate and potentially other types of noise. This can be done using methods such as [9]. In HDR literature, there are also optical approaches that reduce the number of exposures, e.g., [54, 55]. These techniques are complementary to QIS, because QIS is a sensor of similar functionality of a CMOS. Thus optical techniques can always be added.

VI Conclusion

Quanta Image Sensor is a new image sensor for high speed, high resolution and high dynamic range imaging. The sensor has a threshold which needs to be carefully adjusted so that the dynamic range can be maximized. We studied the threshold design problem by establishing several theoretical results. First, we showed that an oracle threshold can be obtained assuming that we know the underlying pixel value. Our result showed that the oracle threshold must match with the pixel value in order to maximize the signal-to-noise ratio. Second, we showed that around the oracle threshold, there exists a set of thresholds that can produce asymptotically unbiased estimates of the pixel value. Within this set of threshold, the signal-to-noise ratio stays very close to the oracle case. Third, we developed a bisection method to update the threshold scheme. We also discussed how QIS can be used in HDR imaging, and its advantages compared to standard sensors. Experimental results showed the effectiveness of our proposed approach compared to the standard approach that uses uniform threshold for all pixels.

Acknowledgment

The authors thank Professor Eric Fossum, Jiaju Ma and Saleh Masoodian at Dartmouth College for many insightful discussions about the physics and circuits of QIS.

Appendix A

A-A Proof of Proposition 3

The Fisher Information metric is defined as:

[TABLE]

where $\theta=\alpha c/K$ . Using the chain rule, we can derive the Fisher Information as follows

[TABLE]

The expectation can be calculated as follows

[TABLE]

Using (7) to differentiate the 1st term, we get:

[TABLE]

where $R=e^{-\theta}\theta^{q-1}$ and $R^{\prime}=\partial R/\partial\theta$ . Similarly, the second term is

[TABLE]

Substitute (A-A) and (LABEL:eq:partial0) in (A-A) yields

[TABLE]

A-B Proof of Proposition 4

The lower bound is obtained by observing that the product $\Psi_{q}(\theta)\left(1-\Psi_{q}(\theta)\right)$ attains its maximum value when ${\Psi_{q}(\theta)=1/2}$ . Substituting with the upper bound ${\Psi_{q}(\theta)\left(1-\Psi_{q}(\theta)\right)\leq 1/4}$ , we get:

[TABLE]

A-C Proof of Proposition 5

Using the definition of Gamma function $\Gamma(q)=(q-1)!$ and $\theta=\frac{\alpha c}{K}$ , we can rewrite the lower bound in Proposition 4 as follows.

[TABLE]

The only dependence on $q$ is in the second term, so we take a closer look at it. When $q-1<\lfloor\theta\rfloor$ , all summands ${\log(\theta/k)}$ are positive because $k<\lfloor\theta\rfloor$ . Hence, the total sum increases by increasing $q$ . On the other hand, when ${q-1>\lfloor\theta\rfloor}$ , we start to add negative summands ${\log(\theta/k)}$ because $k>\theta$ . Therefore, the total sum decreases on increasing $q-1$ over ${\lfloor\theta\rfloor}$ . Thus, maximum is obtained at $q=\lfloor\theta\rfloor+1=\lfloor\frac{\alpha c}{K}\rfloor+1$ .

A-D Proof of Proposition 6

By definition, $S\overset{\text{def}}{=}\sum_{t=0}^{T-1}\sum_{k=0}^{K-1}B_{k,t}$ is the summation of $KT$ independent i.i.d. Bernoulli random variables. Therefore, $S$ is a binomial random variable with parameters $n\overset{\text{def}}{=}KT$ and $p\overset{\text{def}}{=}1-\Psi(\alpha c/K)$ . The mean and variance of a binomial random variable is $\mathbb{E}[S]=np$ , and $\mathrm{Var}[S]=np(1-p)$ . Therefore, we have

[TABLE]

A-E Proof of Proposition 7

The probability $\mathbb{P}[0<\gamma_{q}(c)<1]$ can be evaluated by checking the complement when $\gamma_{q}(c)=0$ or $\gamma_{q}(c)=1$ :

[TABLE]

where (a) follows from the fact that $S$ , which is a sum of i.i.d. Bernoulli random variables, is a binomial random variable.

Let $0<\delta<1$ . If

[TABLE]

then we have

[TABLE]

Thus, it holds that

[TABLE]

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. R. Fossum, “What to do with sub-diffraction-limit (SDL) pixels?–A proposal for a gigapixel digital film sensor (DFS),” in Proc. IEEE Workshop CC Ds Adv. Image Sensors , Sep. 2005, pp. 214–217.
2[2] J. Ma, D. Hondongwa, and E. R. Fossum, “Jot devices and the quanta image sensor,” in Proc. IEEE Int. Electron Devices Meeting , Dec. 2014, pp. 10.1.1–10.1.4.
3[3] J. Ma and E. R. Fossum, “A pump-gate jot device with high conversion gain for a quanta image sensor,” IEEE J. Electron Devices Soc. , vol. 3, no. 2, pp. 73–77, Mar. 2015.
4[4] J. Ma, L. Anzagira, and E. R. Fossum, “A 1 μ 𝜇 \mu m-pitch quanta image sensor jot device with shared readout,” IEEE J. Electron Devices Soc. , vol. 4, no. 2, pp. 83–89, Mar. 2016.
5[5] F. Yang, Y. M. Lu, L. Sbaiz, and M. Vetterli, “An optimal algorithm for reconstructing images from binary measurements,” in Proc. IS&T/SPIE Electronic Imaging, Computational Imaging VIII , Jan. 2010, vol. 7533, pp. 75330 K–75330 K–12.
6[6] F. Yang, Y. M. Lu, L. Sbaiz, and M. Vetterli, “Bits from photons: Oversampled image acquisition using binary poisson statistics,” IEEE Trans. Image Process. , vol. 21, no. 4, pp. 1421–1436, Apr. 2012.
7[7] S. H. Chan and Y. M. Lu, “Efficient image reconstruction for gigapixel quantum image sensors,” in Proc. IEEE Global Conf. Signal and Information Processing (Global SIP’14) , Dec. 2014, pp. 312–316.
8[8] O. A. Elgendy and S. H. Chan, “Image reconstruction and threshold design for quanta image sensors,” in Proc. IEEE Int. Conf. Image Process. (ICIP’16) , Sep. 2016, pp. 978–982.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Optimal Threshold Design for Quanta Image Sensor

Abstract

Index Terms:

I Introduction

I-A Threshold Design for Quanta Image Sensor

I-B Scope and Contributions

II Background

II-A Current State of QIS

II-B Related Work on Threshold Design

II-C QIS Imaging Model

II-C1 Spatial Oversampling

II-C2 Truncated Poisson Process

II-C3 Properties of Truncated Poisson Processes

Example 1**.**

Definition 1**.**

Remark 1**.**

III Optimal Threshold: Theory

III-A Image Reconstruction by MLE

Proposition 1** (Closed-form ML Estimate).**

Proof.

III-B Signal-to-Noise Ratio of ML Estimate

Definition 2**.**

Proposition 2**.**

Proof.

Proposition 3**.**

Proof.

Remark 2**.**

III-C Oracle Threshold

Proposition 4**.**

Proof.

Proposition 5**.**

Proof.

IV Optimal Threshold: Practice

IV-A Asymptotic Unbiasedness

Proposition 6**.**

Proof.

IV-B Set of Admissible Thresholds Qθ\mathcal{Q}_{\theta}Qθ​

Proposition 7**.**

Proof.

Example 2**.**

IV-C Gap between Qθ\mathcal{Q}_{\theta}Qθ​ and q∗q^{*}q∗

IV-D Phase Transition Phenomenon

IV-E Bisection Threshold Update Scheme

IV-F Extension to High Dynamic Range

IV-G Hardware Consideration

V Experimental Results

V-A Convergence

V-B Image Reconstruction Quality

V-C Influence of QIS Threshold on HDR Imaging

VI Conclusion

Acknowledgment

Appendix A

A-A Proof of Proposition 3

A-B Proof of Proposition 4

A-C Proof of Proposition 5

A-D Proof of Proposition 6

A-E Proof of Proposition 7

Example 1.

Definition 1.

Remark 1.

Proposition 1 (Closed-form ML Estimate).

Definition 2.

Proposition 2.

Proposition 3.

Remark 2.

Proposition 4.

Proposition 5.

Proposition 6.

IV-B Set of Admissible Thresholds $\mathcal{Q}_{\theta}$

Proposition 7.

Example 2.

IV-C Gap between $\mathcal{Q}_{\theta}$ and $q^{}$*