Signal Denoising Using the Minimum-Probability-of-Error Criterion

Jishnu Sadasivan; Subhadip Mukherjee; and Chandra Sekhar Seelamantula

arXiv:1702.07869·stat.AP·February 19, 2020

Signal Denoising Using the Minimum-Probability-of-Error Criterion

Jishnu Sadasivan, Subhadip Mukherjee, and Chandra Sekhar Seelamantula

PDF

TL;DR

This paper introduces a novel signal denoising method based on the minimum probability of error (MPE) criterion, which outperforms traditional MSE-based methods, especially at low SNR levels, by minimizing the likelihood of large estimation errors.

Contribution

The paper proposes a new transform-domain shrinkage approach using the MPE criterion, extending it with subband grouping, and demonstrates superior performance over existing methods in low SNR scenarios.

Findings

01

Outperforms MSE-based denoising methods at low SNRs

02

Achieves consistent SNR gains below 5 dB in tests

03

Effective across various noise distributions using Gaussian mixture models

Abstract

We address the problem of signal denoising via transform-domain shrinkage based on a novel $risk$ criterion called the minimum probability of error (MPE), which measures the probability that the estimated parameter lies outside an $ϵ$ -neighborhood of the actual value. However, the MPE, similar to the mean-squared error (MSE), depends on the ground-truth parameter, and has to be estimated from the noisy observations. We consider linear shrinkage-based denoising functions, wherein the optimum shrinkage parameter is obtained by minimizing an estimate of the MPE. When the probability of error is integrated over $ϵ$ , it leads to the expected $ℓ_{1}$ distortion. The proposed MPE and $ℓ_{1}$ distortion formulations are applicable to various noise distributions by invoking a Gaussian mixture model approximation. Within the realm of MPE, we also develop an extension of…

Tables1

Table 1. Table I: Comparison of MPE, SURE-based shrinkage estimator and Wiener filter (WF) for different input SNRs. The output SNR values are averaged over 100 100 100 noise realizations.

Input SNR	Output SNR (dB)
(dB)	MPE			SURE	WF
	$ϵ$ $=$ 3.5 $σ$	$ϵ$ $=$ 2.5 $σ$	$ϵ$ $=$ 1.5 $σ$
$- 5.0$	$11.67$	$5.99$	$- 0.18$	$- 0.27$	$1.44$
$- 2.5$	$14.42$	$8.62$	$2.34$	$2.23$	$3.96$
$0$	$17.02$	$10.96$	$4.80$	$4.71$	$6.35$
$2.5$	$19.08$	$13.36$	$7.31$	$7.21$	$8.79$
$5.0$	$21.25$	$15.52$	$9.72$	$9.64$	$11.09$
$7.5$	$22.93$	$18.26$	$12.32$	$12.23$	$13.60$
$10.0$	$25.34$	$20.57$	$14.77$	$14.69$	$15.92$
$12.5$	$26.91$	$22.79$	$17.26$	$17.17$	$18.20$
$15.0$	$28.77$	$25.05$	$19.66$	$19.59$	$20.33$
$17.5$	$30.74$	$27.44$	$22.20$	$22.12$	$22.57$
$20.0$	$32.65$	$29.61$	$24.61$	$24.54$	$24.60$

Equations71

R

R

R (s; a)

R (s; a)

R = 1 - F (\frac{ϵ - ( a - 1 ) s ~}{a}) + F (- \frac{ϵ + ( a - 1 ) s ~}{a}),

R = 1 - F (\frac{ϵ - ( a - 1 ) s ~}{a}) + F (- \frac{ϵ + ( a - 1 ) s ~}{a}),

R

R

f (w)

f (w)

F (w)

F (w)

G_{1} (a, b; c; z)

G_{1} (a, b; c; z)

(q)_{k}

(q)_{k}

F (w)

F (w)

P_{e}^{MPE} = P (∣ a_{opt} (s) - a_{opt} (x) ∣ \geq δ),

P_{e}^{MPE} = P (∣ a_{opt} (s) - a_{opt} (x) ∣ \geq δ),

P_{e}^{MPE} \leq 2 exp (- \frac{δ ^{2}}{2 σ ^{2} a _{opt}^{'} ( s ) ^{2}}) .

P_{e}^{MPE} \leq 2 exp (- \frac{δ ^{2}}{2 σ ^{2} a _{opt}^{'} ( s ) ^{2}}) .

a_{opt}^{'} (s)^{2}

a_{opt}^{'} (s)^{2}

\frac{s ^{6}}{8 σ ^{6}} (δ - \frac{σ ^{4}}{( s ^{2} + σ ^{2} ) s ^{2}})^{2} \geq lo g (\frac{2}{α}) .

\frac{s ^{6}}{8 σ ^{6}} (δ - \frac{σ ^{4}}{( s ^{2} + σ ^{2} ) s ^{2}})^{2} \geq lo g (\frac{2}{α}) .

f (w) = m = 1 \sum M \frac{α _{m}}{σ _{m} 2 π} exp (- \frac{( w - θ _{m} ) ^{2}}{2 σ _{m}^{2}}),

f (w) = m = 1 \sum M \frac{α _{m}}{σ _{m} 2 π} exp (- \frac{( w - θ _{m} ) ^{2}}{2 σ _{m}^{2}}),

R = m = 1 \sum M α_{m} [Q (\frac{ϵ - ( a - 1 ) s ~ - θ _{m}}{a σ _{m}}) +

R = m = 1 \sum M α_{m} [Q (\frac{ϵ - ( a - 1 ) s ~ - θ _{m}}{a σ _{m}}) +

Q (\frac{ϵ + ( a - 1 ) s ~ + θ _{m}}{a σ _{m}})],

R

R

F (θ ∣ k, λ) = m = 0 \sum \infty \frac{λ ^{m} e ^{- \frac{λ}{2}}}{2 ^{m} m !} P [χ_{k + 2 m}^{2} \leq θ],

F (θ ∣ k, λ) = m = 0 \sum \infty \frac{λ ^{m} e ^{- \frac{λ}{2}}}{2 ^{m} m !} P [χ_{k + 2 m}^{2} \leq θ],

s_{n} = cos (\frac{5 π n}{2048}) + 2 sin (\frac{10 π n}{2048}), 0 \leq n \leq 2047,

s_{n} = cos (\frac{5 π n}{2048}) + 2 sin (\frac{10 π n}{2048}), 0 \leq n \leq 2047,

E {∣ s - s ∣} = \int_{0}^{\infty} P (∣ s - s ∣ > ϵ) d ϵ .

E {∣ s - s ∣} = \int_{0}^{\infty} P (∣ s - s ∣ > ϵ) d ϵ .

R_{ℓ_{1}} (a, s) = E {∣ s - s ∣} = \int_{0}^{\infty} Q (\frac{ϵ - ( a - 1 ) s}{aσ}) d ϵ

R_{ℓ_{1}} (a, s) = E {∣ s - s ∣} = \int_{0}^{\infty} Q (\frac{ϵ - ( a - 1 ) s}{aσ}) d ϵ

+ \int_{0}^{\infty} Q (\frac{ϵ + ( a - 1 ) s}{aσ}) d ϵ .

= aσ (\int_{0}^{\infty} Q (u) d u - \int_{0}^{μ} Q (u) d u)

= aσ (\int_{0}^{\infty} Q (u) d u - \int_{0}^{μ} Q (u) d u)

= aσ (\frac{1}{2 π} - μ Q (μ) - \frac{1}{2 π} (1 - e^{- \frac{μ ^{2}}{2}}))

= aσ (\frac{e ^{- \frac{μ ^{2}}{2}}}{2 π} - μ Q (μ)) .

= aσ [\frac{2}{π} exp (- \frac{( a - 1 ) ^{2} s ^{2}}{2 a ^{2} σ ^{2}}) + 2 \frac{( a - 1 ) s}{aσ}

= aσ [\frac{2}{π} exp (- \frac{( a - 1 ) ^{2} s ^{2}}{2 a ^{2} σ ^{2}}) + 2 \frac{( a - 1 ) s}{aσ}

Q (- \frac{( a - 1 ) s}{aσ}) - \frac{( a - 1 ) s}{aσ}] .

R_{ℓ_{1}} = m = 1 \sum M a α_{m} σ_{m} (\frac{2}{π} e^{- \frac{μ _{m}^{2}}{2}} - 2 μ_{m} Q (μ_{m}) + μ_{m}),

R_{ℓ_{1}} = m = 1 \sum M a α_{m} σ_{m} (\frac{2}{π} e^{- \frac{μ _{m}^{2}}{2}} - 2 μ_{m} Q (μ_{m}) + μ_{m}),

P_{e}^{SURE} = P {∣ a_{opt} (s) - a_{opt} (x) ∣ \geq δ},

P_{e}^{SURE} = P {∣ a_{opt} (s) - a_{opt} (x) ∣ \geq δ},

h (x) = a_{opt} (s) - a_{opt} (x) = (\frac{s ^{2}}{s ^{2} + σ ^{2}} - 1 + \frac{σ ^{2}}{x ^{2}}) .

h (x) = a_{opt} (s) - a_{opt} (x) = (\frac{s ^{2}}{s ^{2} + σ ^{2}} - 1 + \frac{σ ^{2}}{x ^{2}}) .

h (x) = \frac{σ ^{4}}{s ^{2} ( s ^{2} + σ ^{2} )} - 2 \frac{w σ ^{2}}{s ^{3}} + n = 2 \sum \infty \frac{d ^{(n)} ( s )}{n !} w^{n},

h (x) = \frac{σ ^{4}}{s ^{2} ( s ^{2} + σ ^{2} )} - 2 \frac{w σ ^{2}}{s ^{3}} + n = 2 \sum \infty \frac{d ^{(n)} ( s )}{n !} w^{n},

h (x) \approx \frac{σ ^{4}}{s ^{2} ( s ^{2} + σ ^{2} )} - 2 \frac{w σ ^{2}}{s ^{3}},

h (x) \approx \frac{σ ^{4}}{s ^{2} ( s ^{2} + σ ^{2} )} - 2 \frac{w σ ^{2}}{s ^{3}},

P_{e}^{SURE} = P {∣ h (x) ∣ \geq δ} \approx P {\frac{σ ^{4}}{s ^{2} ( s ^{2} + σ ^{2} )} - 2 \frac{w σ ^{2}}{s ^{3}} \geq δ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Signal Denoising Using the Minimum-Probability-of-Error Criterion

Jishnu Sadasivan, Subhadip Mukherjee, and Chandra Sekhar Seelamantula, *Senior member, IEEE * J. Sadasivan is with the Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India. Phone: +91 80 2293 2276. Fax: +91 80 2360 0563. E-mail: [email protected]. S. Mukherjee and C. S. Seelamantula are with the Department of Electrical Engineering, Indian Institute of Science, Bangalore, India. Phone: +91 80 2293 2695. Fax: +91 80 2360 0444. E-mails: {subhadip, chandra.sekhar}@ee.iisc.ernet.in.

Abstract

We address the problem of signal denoising via transform-domain shrinkage based on a novel risk criterion called the minimum probability of error (MPE), which measures the probability that the estimated parameter lies outside an $\epsilon$ -neighborhood of the actual value. However, the MPE, similar to the mean-squared error (MSE), depends on the ground-truth parameter, and has to be estimated from the noisy observations. We consider linear shrinkage-based denoising functions, wherein the optimum shrinkage parameter is obtained by minimizing an estimate of the MPE. When the probability of error is integrated over $\epsilon$ , it leads to the expected $\ell_{1}$ distortion. The proposed MPE and $\ell_{1}$ distortion formulations are applicable to various noise distributions by invoking a Gaussian mixture model approximation. Within the realm of MPE, we also develop an extension of the transform-domain shrinkage by grouping transform coefficients, resulting in subband shrinkage. The denoising performance obtained within the proposed framework is shown to be better than that obtained using the minimum MSE-based approaches formulated within Stein’s unbiased risk estimation (SURE) framework, especially in the low measurement signal-to-noise ratio (SNR) regime. Performance comparison with three state-of-the-art denoising algorithms, carried out on electrocardiogram signals and two test signals taken from the Wavelab toolbox, exhibits that the MPE framework results in consistent SNR gains for input SNRs below $5$ dB.

Index Terms:

Minimum probability of error, shrinkage estimator, risk estimation, transform-domain shrinkage, subband shrinkage, expected $\ell_{1}$ distortion, Gaussian mixture model.

I Introduction

Signal denoising algorithms are often developed with the objective of minimizing the mean-squared error (MSE) between an estimate and the ground-truth, which may be deterministic or stochastic with a known prior. The latter formalism leads to Bayesian estimators. Within the deterministic signal estimation paradigm, which is also the formalism considered in this paper, one typically desires that the estimator has minimum variance and is unbiased (MVU) [1, 2]. An MVU estimator may not always exist, and if it does, it can be obtained using the theory of sufficient statistics. Eldar and Kay [2] showed that, when it comes to minimizing the MSE, biased estimates may outperform the MVU estimate. For example, one could shrink the MVU estimate and optimize for the shrinkage parameter so that the MSE is minimum.

In this paper, we consider the problem of estimating a deterministic signal corrupted by additive white noise. The noise distribution is assumed to be known, but not restricted to be Gaussian. We propose a new distortion metric based on the probability of error and develop estimators using a transform-domain shrinkage approach. Before proceeding with the developments, we review some important literature related to the problem at hand.

I-A Prior Art

The MSE is by far the most widely used metric for obtaining the optimum shrinkage parameter. Since the MSE is a function of the parameter to be estimated, direct minimization might result in an unrealizable estimate, in the sense that it might depend on the unknown parameter. However, in some cases, it is possible to find the optimum shrinkage parameter, for example, using a min-max approach [2], where the parameter is constrained to a known set. An optimum shrinkage estimator, when the variance of the unbiased estimate (or MVU) is a scaled version of the square of the parameter, with a known scaling, is proposed in [2].

Optimum shrinkage estimators have also been computed based on risk estimation, where an unbiased estimate of the MSE that depends only on the noisy observations is obtained and subsequently minimized over the shrinkage parameter. Under the assumption of Gaussian noise, an unbiased estimate of the MSE, namely Stein’s unbiased risk estimator (SURE), was developed based on Stein’s lemma [3], and has been successfully employed in numerous denoising applications. In his seminal work [3], Stein proved that the shrinkage estimator of the mean of a multivariate Gaussian distribution, obtained from its independent and identically distributed (i.i.d.) samples by minimizing SURE, dominates the classical least-squares estimate when the number of samples exceeds three [4].

A risk minimization approach for denoising using a linear expansion of elementary thresholding functions has been addressed in [5, 6, 7, 8, 9], wherein the combining weights are chosen optimally to minimize the SURE objective. SURE-optimized wavelet-domain thresholding techniques have been developed in [10, 11, 12]. Atto et al. [13, 14] have investigated the problem of signal denoising based on optimally selecting the parameters of a wavelet-domain smooth sigmoidal shrinkage function by minimizing the SURE criterion. The use of SURE objective is not restricted to denoising; it has found applications in image deconvolution as well [15].

Ramani et al. [16] developed a Monte-Carlo technique to select the parameters of a generic denoising operator based on SURE. An image denoising algorithm based on non-local means (NLM) is proposed in [17], where parameters of NLM are optimized using SURE. Notable denoising algorithms that aim to optimize the SURE objective include wavelet-domain multivariate shrinkage [18], local affine transform for image denoising [19], optimal basis selection for denoising [20], raised-cosine-based fast bilateral filtering for image denoising [21], SURE-optimized Savitzky-Golay filter [22], etc..

The original formulation of SURE, which assumed independent Gaussian noise was extended to certain distributions in continuous and discrete exponential families in [23] and [24], respectively, with the assumption of independence left unchanged. Eldar generalized SURE (GSURE) for distributions belonging to the non-i.i.d. multivariate exponential family [25]. Giryes et al. [26] used a projected version of GSURE for selecting parameters in the context of solving inverse problems. An unbiased estimate of the Itakura-Saito (IS) distortion and corresponding pointwise shrinkage was developed in [27] and [28], and successfully applied to speech denoising. A detailed discussion of Gaussian parameter estimation using shrinkage estimators, together with a performance comparison of SURE with the maximum-likelihood (ML) and soft-thresholding-based estimators, can be found in [29] (Chapter $2$ ). It is shown in [29] that the soft-thresholding-based estimator dominates the James-Stein shrinkage estimator in terms of MSE if the parameter vector to be estimated is sparse. On the other hand, shrinkage estimator dominates if all coordinates of the parameter to be estimated are nearly equal.

I-B This Paper

We address the problem of signal denoising based on the minimum probability of error (MPE), which we first considered in [30]. The MPE quantifies the probability of the estimate lying outside an $\epsilon$ -neighborhood of the true value. Since the MPE risk depends on the ground truth, we consider a surrogate, which may be biased, and optimize it to obtain the shrinkage parameter (Section II). The optimization is carried out in the discrete cosine transform (DCT) domain, either in a pointwise fashion or on a subband basis. We derive the MPE risk for Gaussian, Laplacian, and Student’s- $t$ noise distributions (Sections II-A and II-B). In practical applications, where the noise distribution may be multimodal and not known explicitly, we propose to use a Gaussian mixture model (GMM) approximation [31, 32] (Section II-A3). We show the performance of the MPE-based denoising technique on the Piece-Regular signals taken from the Wavelab toolbox in Gaussian, Student’s- $t$ , and Laplacian noise contaminations (Section III). Proceeding further, we also consider the probability of error accumulated over $0<\epsilon<\infty$ (Section IV), which results in the expected $\ell_{1}$ distortion between the parameter and its estimate. The estimators for the expected $\ell_{1}$ distortion are also derived by invoking the GMM approximation (Section IV-A). We also assess the denoising performance of the shrinkage estimator obtained by minimizing the $\ell_{1}$ distortion for different input SNRs and for different number of noisy realizations (Section V).

To further boost the denoising performance of the $\ell_{1}$ distortion-based estimator, we develop an iterative algorithm to successively refine the cost function and the resulting estimate, starting with the noisy signal as the initialization (Section V). The iterations lead to an improvement of $2$ – $3$ dB in output signal-to-noise ratio (SNR) (Section V-A).

Performance comparison of the MPE and $\ell_{1}$ distortion-based estimators is carried out on the Piece-Regular and the HeaviSine signals from the Wavelab toolbox [33], and electrocardiogram (ECG) signals from the PhysioBank database [42], with three benchmark techniques: (i) wavelet-domain soft-thresholding [34], (ii) SURE-based orthonormal wavelet thresholding using a linear expansion of thresholds (SURE-LET) [5], and (iii) SURE-based smooth sigmoid shrinkage (SS) in wavelet domain[13]; all assuming Gaussian noise contamination (Section VI) for fair comparison.

II The MPE Risk

Consider the observation model $\bf x=s+w$ in $\mathbb{R}^{n}$ , where ${\bf x}$ and ${\bf s}$ denote the noisy and clean signals, respectively. The noise vector ${\bf w}$ is assumed to have i.i.d. entries with zero mean and variance $\sigma^{2}$ . The goal is to estimate $\bf s$ from $\bf x$ by minimizing a suitable risk function. The signal model is considered in an appropriate transform domain, where the signal admits a parsimonious representation, but noise does not. We consider two types of shrinkage estimators: (i) pointwise, where a shrinkage factor $a_{i}\in[0,1]$ is applied to $x_{i}$ to obtain an estimate $\widehat{s}_{i}=a_{i}x_{i}$ ; and (ii) subband-based, wherein a single shrinkage factor $a_{J}$ is applied to a group of coefficients { $x_{i}$ , $i\in J$ } in subband $J\subset\{1,2,\cdots,n\}$ . Shrinkage estimators may also be interpreted as premultiplication of $\mathbb{x}$ with a diagonal matrix.

II-A MPE Risk for Pointwise Shrinkage

Assuming that the estimate of $s_{i}$ does not depend on $x_{j}$ , for $j\neq i$ , we drop the index $i$ for brevity of notation. The MPE risk is defined as

[TABLE]

where $\epsilon>0$ is a predefined tolerance parameter. The risk $\mathcal{R}$ quantifies the estimation error using the probability measure and takes into account the noise distribution in its entirety. On the contrary, the MSE relies only on the first- and second-order statistics of noise for linear shrinkage estimators. Substituting $\widehat{s}=ax=a(s+w)$ , the risk $\mathcal{R}$ evaluates to

[TABLE]

where $F\left(\cdot\right)$ is the cumulative distribution function (c.d.f.) of the additive noise. Since $\mathcal{R}$ depends on $s$ , which is the parameter to be estimated, it is impractical to optimize it directly over $a$ . To circumvent the problem, we minimize an estimate of $\mathcal{R}$ , which is obtained by replacing $s$ with an estimate $\tilde{s}$ , which, for example, may be obtained using any baseline denoising algorithm, or can even be taken as $\tilde{s}=x$ (which is also the ML estimate of $s$ ). In the first instance, the proposed technique becomes an add-on to an existing denoising algorithm, and in the second, it is a denoising scheme in itself. Such an estimate $\widehat{\mathcal{R}}=\mathcal{R}\left(\tilde{s};a\right)$ takes the form

[TABLE]

and correspondingly, the optimal shrinkage parameter is obtained as $a_{\text{opt}}=\text{arg\,}\underset{0\leq a\leq 1}{\min\,}\widehat{\mathcal{R}}$ . A grid search is performed to optimize $\widehat{\mathcal{R}}$ over $a\in[0,1]$ , and the clean signal is obtained as $\widehat{s}=a_{\text{opt}}x$ . We next derive explicit formulae for the risk function for Gaussian, Laplacian, and Student’s- $t$ noise distributions.

(i) Gaussian distribution: In this case, the noisy observation $x$ also follows a Gaussian distribution, and therefore, $\widehat{s}-s=ax-s$ is distributed as $\mathcal{N}\left((a-1)s,a^{2}\sigma^{2}\right)$ . The MPE risk estimate is given as

[TABLE]

where $Q(u)=\displaystyle\frac{1}{\sqrt{2\pi}}\int_{u}^{\infty}e^{-\frac{t^{2}}{2}}\mathrm{d}t$ .

(ii) Student’s- $t$ distribution: Consider the case where the noise follow a Student’s- $t$ distribution with parameter $\lambda>2$ and the probability density function (p.d.f.) of noise is given by

[TABLE]

The variance of $w$ is $\sigma^{2}=\frac{\lambda}{\lambda-2}$ . The expression for $\widehat{\mathcal{R}}$ is the one given in (3) with

[TABLE]

where $G_{1}$ is the hypergeometric function defined as

[TABLE]

and $(q)_{k}$ denotes the Pochhammer symbol:

[TABLE]

(iii) Laplacian distribution: Considering the noise to be i.i.d. Laplacian with zero-mean and parameter $b$ (variance $\sigma^{2}=2b^{2}$ ), with the p.d.f. $f(w)=\frac{1}{2b}\exp\left(-\frac{|w|}{b}\right)$ , the MPE risk can be obtained by using the following expression for $F(w)$ in (3):

[TABLE]

II-A1 Closeness of $\widehat{\mathcal{R}}$ to $\mathcal{R}$

To measure the closeness of $\widehat{\mathcal{R}}$ to $\mathcal{R}$ , consider the example of estimating a scalar $s=4$ from a noisy observation $x$ . The MPE risk estimate $\widehat{\mathcal{R}}$ is obtained by setting $\tilde{s}=x$ . In Figures 1(a), 1(b), and 1(c), we show the variation of the actual risk $\mathcal{R}$ and its estimate $\widehat{\mathcal{R}}$ with $a$ , averaged over $100$ independent trials, for Gaussian, Student’s- $t$ , and Laplacian noise distributions, respectively. The noise has zero mean, and the variance is taken as $\sigma^{2}=1$ for Gaussian and Laplacian models, whereas for Student’s- $t$ model, the variance is $\sigma^{2}=2$ . The value of $\epsilon$ is set equal to $\sigma$ while computing the MPE risk. We observe that $\widehat{\mathcal{R}}$ is a good approximation to $\mathcal{R}$ , particularly in the vicinity of the minima. The deviation of the shrinkage parameter $a_{\text{opt}}(x)$ , obtained by minimizing $\widehat{\mathcal{R}}$ , with respect to its true value $a_{\text{opt}}(s)$ resulted from the minimization of $\mathcal{R}$ , is shown in Figure 1(d) for three noise models under consideration. The central red lines in Figure 1(d) indicate the medians, whereas the black lines on the top and bottom denote the $25$ and the $75$ percentile points, respectively. We observe that $a_{\text{opt}}(x)$ is well concentrated around $a_{\text{opt}}(s)$ , especially for Gaussian and Laplacian noise, barring a small number of outliers.

II-A2 Perturbation Probability of the location of minimum

The location of the minimum of the MPE risk determines the shrinkage parameter. Therefore, one must ensure that it does not deviate too much from its actual value, with high probability, when $s$ is replaced by $x$ in the original risk $\mathcal{R}$ . Let $a_{\text{opt}}\left(s\right)=\arg\underset{0\leq a\leq 1}{\min}\mathcal{R}\left(s;a\right)$ denote the argument that minimizes the true risk $\mathcal{R}$ . Consider the probability of deviation, given by

[TABLE]

for some $\delta>0$ . Using a first-order Taylor series approximation of $a_{\text{opt}}\left(x\right)$ about $s$ , and substituting $x=s+w$ , we obtain $a_{\text{opt}}\left(x\right)\approx a_{\text{opt}}\left(s\right)+wa^{\prime}_{\text{opt}}\left(s\right)$ , where ′ denotes the derivative. The deviation probability $P_{e}^{\text{MPE}}$ in (7) simplifies to $P_{e}^{\text{MPE}}=\mathbb{P}\left(\left|w\right|\geq\frac{\delta}{\left|a^{\prime}_{\text{opt}}\left(s\right)\right|}\right).$ For additive Gaussian noise $w$ with zero mean and variance $\sigma^{2}$ , placing the Chernoff bound on $P_{e}^{\text{MPE}}$ leads to

[TABLE]

To ensure that $P_{e}^{\text{MPE}}$ is less than $\alpha$ , for a given $\alpha\in(0,1)$ , it suffices to have

[TABLE]

which translates to a lower-bound on the input SNR. Since there is no closed-form expression available for $a^{\prime}_{\text{opt}}\left(s\right)$ in the context of MPE risk, we empirically obtain the range of input SNR values $\displaystyle{\frac{s^{2}}{\sigma^{2}}}$ , for which (8) is satisfied.

Analogously, to satisfy an upper bound on the deviation probability $P_{e}^{\text{SURE}}$ of the minimum in the case of SURE, for a given deviation $\delta>0$ , one must ensure that

[TABLE]

The proof of (9) is given in Appendix A.

The minimum input SNR required to ensure $P_{e}\leq\alpha$ for both SURE- and MPE-based shrinkage estimators is shown in Figure 2, for different values of $\alpha$ and $\delta$ . The MPE-risk estimate is obtained by replacing $s$ with $x$ and setting $\epsilon=\sigma$ . We observe that reducing the amount of deviation $\delta$ for a given probability $\alpha$ , or vice versa, leads to a higher input SNR requirement for both SURE and MPE. We also observe from Figure 2 that, for given $\delta$ and $\alpha$ , SURE requires a higher input SNR than MPE to keep the $\delta$ -deviation probability under $\alpha$ . Also, for a given input SNR, the $\delta$ -deviation probability of the estimated shrinkage parameter $a_{\text{opt}}\left(x\right)$ from the optimum $a_{\text{opt}}\left(s\right)$ is smaller for MPE than SURE, thereby indicating that the MPE-based shrinkage is comparatively more reliable than the SURE-based one at lower input SNRs.

II-A3 Unknown noise distributions

In practical applications, the distribution of noise may not be known in a parametric form and may also be multimodal. At best, one would have access to realizations of the noise, from which the distribution has to be estimated. In such cases, approximation of the noise p.d.f. using a GMM is a viable alternative [31], wherein one can estimate the parameters of GMM using the expectation-maximization algorithm [35]. Gaussian mixture modeling is attractive as it comes with certain guarantees. For example, it is known that a p.d.f. with a finite number of finite discontinuities can be approximated by a GMM to a desired accuracy except at the points of discontinuity [32, 31]. The GMM approximation can be used even for non-Gaussian, unimodal distributions. For the GMM-based noise p.d.f.

[TABLE]

the MPE risk turns out to be

[TABLE]

using (4). For illustration, consider the estimation of a scalar $s=4$ in the transform domain from its noisy observation $x$ . The additive noise is Laplacian distributed with zero mean and variance $\sigma^{2}=1$ . The noise distribution is modeled using a GMM with $M=4$ components and the corresponding MPE risk estimate is obtained using (11) by setting $\tilde{s}=x$ . In Figure 3(a), we show a Laplacian p.d.f. and its GMM approximation. Figure 3(b) shows the GMM approximation to a multimodal distribution. Figure 4(a) shows the MPE risk based on the original Laplacian distribution as well as the GMM approximation, as a function of the shrinkage parameter $a$ . The close match between the two indicates that the GMM is a viable alternative when the noise distribution is unknown or follows a complicated model. In Figure 4(b), we plot the GMM-based MPE risk and its estimate averaged over $100$ independent trials. We observe that the locations of minima of the actual risk and its estimate match closely, thereby justifying the minimization of $\widehat{\mathcal{R}}$ . The MPE risk and its estimate are shown in Figure 4(c) for the multimodal p.d.f. of Figure 3(b).

II-B MPE Risk for Subband Shrinkage

Let $a_{J}$ be the shrinkage factor applied to the set of coefficients { $x_{i}$ , $i\in J$ } in subband $J$ . The estimate $\widehat{s}_{J}$ of the clean signal is obtained by $\widehat{\mathbb{s}}_{J}=a_{J}\mathbb{x}_{J}$ , where $\mathbb{x}_{J}\in\mathbb{R}^{\left|J\right|}$ and $a_{J}\in\left[0,1\right]$ . For notational brevity, we drop the subscript $J$ , as we did for pointwise shrinkage, and express the estimator as $\widehat{\mathbf{s}}=a\mathbf{x}$ , where boldface letters indicate vectors.

Analogous to pointwise shrinkage, the MPE risk for subband shrinkage is defined as $\mathcal{R}=\mathbb{P}\left(\left\|\widehat{\mathbf{s}}-\mathbf{s}\right\|_{2}>\epsilon\right),$ which, for $\widehat{\mathbf{s}}=a\mathbf{x}$ , becomes $\mathcal{R}=\mathbb{P}\left(\left\|a\mathbf{w}+(a-1)\mathbf{s}\right\|_{2}>\epsilon\right)$ . For $\mathbf{w}\sim\mathcal{N}\left(0,\sigma^{2}I\right)$ ,

[TABLE]

where $k=\left|J\right|$ , $\lambda=\sum_{j=1}^{k}\frac{\left(1-a\right)^{2}s_{j}^{2}}{a^{2}\sigma^{2}}$ , $\theta=\left(\frac{\epsilon}{a\sigma}\right)^{2}$ , and $F(\theta|k,\lambda)$ is the c.d.f. of the non-central $\chi^{2}$ distribution, given by

[TABLE]

wherein $\chi^{2}_{v}$ denotes the central $\chi^{2}$ random variable having $v$ degrees of freedom.

Similar to pointwise shrinkage, we propose to obtain an estimate $\widehat{\mathcal{R}}$ of $\mathcal{R}$ for subband shrinkage estimators either by replacing $s_{j}$ with $x_{j}$ , or by an estimate $\tilde{s}_{j}$ produced by any standard denoising algorithm. The optimum subband shrinkage factor is obtained by minimizing $\widehat{\mathcal{R}}$

Figure 5 shows the subband MPE risk and its estimate versus $a$ , where the underlying clean signal $\mathbf{s}\in\mathbb{R}^{\left|J\right|}$ is corrupted by Gaussian noise and the subband size is chosen to be $\left|J\right|=k=8$ . The clean signal $\mathbf{s}$ is generated by drawing samples from $\mathcal{N}\left(2\times\mathbb{1}_{k},I_{k}\right)$ , where $\mathbb{1}_{k}$ and $I_{k}$ denote a $k$ -length vector of all ones and a $k\times k$ identity matrix, respectively. The observation $\mathbf{x}$ is obtained by adding zero-mean i.i.d. Gaussian noise to $\mathbf{s}$ , with an input SNR of $5$ dB, where the input SNR is defined as $\text{SNR}_{\text{in}}=10\log_{10}\left(\displaystyle\frac{1}{k\sigma^{2}}\sum_{n=1}^{k}s_{n}^{2}\right)\text{\,dB}$ . The MPE risk estimate is obtained by replacing $\mathbf{s}$ with $\mathbf{x}$ in (12), which does not significantly shift the location of the minimum (cf. Figure 5).

III Experimental Results for MPE-Based Denoising

The performance of the MPE-based pointwise and subband shrinkage estimator is validated on a synthesized harmonic signal (of length $N=2048$ ) in Gaussian noise and the Piece-Regular signal (of length $N=4096$ ) in Gaussian, Student’s- $t$ , and Laplacian noise. The Piece-Regular signal has both smooth and rapidly-varying regions, making it a suitable candidate for the assessment of denoising performance.

III-A Performance of Pointwise-Shrinkage Estimator

III-A1 Harmonic signal denoising

Consider the signal

[TABLE]

in additive white Gaussian noise, with zero mean and variance $\sigma^{2}$ . Since the denoising is carried out in the DCT [39] domain, the Gaussian noise statistics remain unaltered. For the purpose of illustration, we assume that $\sigma^{2}$ is known. In practice, $\sigma^{2}$ may not be known a priori and could be replaced by the robust median estimate [37] or the trimmed estimate [38]. The clean signal is estimated using inverse DCT after applying the optimum shrinkage. The denoising performance of the MPE and SURE-based approaches is compared in Table I. In case of the Wiener filter, the power spectrum of the clean signal is estimated using the standard spectral subtraction technique [40, 41]. We observe that MPE-based shrinkage with $\epsilon=3.5\,\sigma$ is superior to SURE and Wiener filter (WF) by $8$ – $12$ dB. The comparison also shows that the performance of the MPE depends critically on $\epsilon$ .

III-A2 Piece-Regular signal denoising

We consider noisy copies of the Piece-Regular signal, taken from the Wavelab toolbox [33], under Gaussian, Student’s- $t$ , and Laplacian contaminations. The noise variance is assumed to be known. Notably, the Gaussian, GMM, and Student’s- $t$ distributions of noise are preserved by an orthonormal transform [36], unlike the Laplacian statistics. Therefore, the MPE estimate for Laplacian noise is computed based on a four-component GMM approximation in the DCT domain. The denoised output signal corresponding to Laplacian noise is shown in Figure 6 for illustration. The MPE estimates are better than SURE estimates. The SNR plots in Figure 7 indicate that the MPE outperforms SURE for the noise statistics under consideration and that the gains are particularly high in the input SNR range of $-5$ to $20$ dB, and tend to reduce beyond $20$ dB.

III-A3 Effect of $\epsilon$ on the denoising performance of MPE

Obtaining a closed-form expression for the $\epsilon$ that maximizes the output SNR is not straightforward. We determine the optimum $\epsilon$ empirically by measuring the SNR gain as a function of $\epsilon$ (cf. Figure 8), for i.i.d. Gaussian noise. We observe that the output SNR exhibits a peak approximately at $\beta=\frac{\epsilon}{\sigma}=3.5$ for the harmonic signal in (13) and at $\beta=3$ for the Piece-Regular signal. As a rule of thumb, we recommend to choose $\epsilon=3\sigma$ for pointwise shrinkage estimators.

III-B Performance of Subband MPE Shrinkage

To validate the performance of the MPE-based subband shrinkage estimator (cf. Section II-B), we consider denoising of the Piece-Regular signal in additive Gaussian noise. The clean signal and its noisy measurement are shown in Figure 9(a). Denoising is carried out by grouping $k$ adjacent DCT coefficients to form a subband. The denoised signals obtained using SURE and MPE are shown in Figures 9(b) and 9(c), respectively. The subband size $k$ is chosen to be $16$ and the parameter $\epsilon$ is set equal to $1.75\sqrt{k}\sigma$ , a value that was determined experimentally and found to be nearly optimal. We observe that the MPE gives 1 dB improvement in SNR than the SURE approach.

Variation of the output SNR is also studied as a function of $k$ (cf. Figure 10). We experimented with $\epsilon=3\sigma$ , $\epsilon=1.75\sqrt{k}\sigma$ , and $\epsilon=1.25\sqrt{k}\sigma$ corresponding to subband sizes $k=1$ , $k\in[2,16]$ , and $k>16$ , respectively. For both SURE and MPE, as $k$ increases, the output SNR also increases and eventually saturates for $k\geq 40$ . For input SNR below $15$ dB, MPE gives a comparatively higher SNR than SURE, and the margin diminishes with increase in input SNR or the subband size $k$ . The degradation in performance of SURE for low SNRs is due to the large error in estimating the MSE at such SNRs. The SURE-based estimate of MSE becomes increasingly reliable as $k$ increases, thereby leading to superior performance.

IV Accumulated Probability of Error: MPE Meets the Expected $\ell_{1}$ Distortion

The MPE is parametrized by $\epsilon$ , which has to be appropriately chosen in order to achieve optimal denoising performance. To suppress the direct dependence on $\epsilon$ , we consider the accumulated probability of error, namely $\int_{0}^{\infty}\mathbb{P}\left(\left|\widehat{s}-s\right|>\epsilon\right)\mathrm{d}\epsilon$ as the risk to be minimized. For a nonnegative random variable $Y$ , we know that $\mathcal{E}\{Y\}=\int_{0}^{\infty}\mathbb{P}\left(Y>\epsilon\right)\mathrm{d}\epsilon.$ Therefore, the accumulated probability of error is the expected $\ell_{1}$ distortion:

[TABLE]

For Gaussian noise distribution,

[TABLE]

Denoting $u=\displaystyle\frac{\epsilon-(a-1)s}{a\sigma}$ and $\mu=-\displaystyle\frac{(a-1)s}{a\sigma}$ , the first integral in (15) is evaluated as

$\displaystyle\int_{0}^{\infty}Q\left(\frac{\epsilon-(a-1)s}{a\sigma}\right)\mathrm{d}\epsilon=a\sigma\int_{\mu}^{\infty}Q\left(u\right)\mathrm{d}u$

[TABLE]

The second term in (15) can be evaluated by replacing $\mu$ with $-\mu$ in (16). Combining both integrals, we obtain the expression for the expected $\ell_{1}$ distortion:

$\mathcal{R}_{\ell_{1}}\left(a,s\right)=a\sigma\left[\sqrt{\frac{2}{\pi}}e^{-\frac{\mu^{2}}{2}}-\mu Q\left(\mu\right)+\mu Q\left(-\mu\right)\right]$

[TABLE]

An estimate of the expected $\ell_{1}$ distortion is calculated by replacing $s$ with an estimate $\tilde{s}$ , which could also be $x$ , to begin with. In Figure 11(a), we show the variation of the original $\ell_{1}$ distortion and its estimate obtained by setting $\tilde{s}=x$ , as functions of $a$ , averaged over $100$ independent realizations of $\mathcal{N}\left(0,1\right)$ noise. The actual parameter value is $s=4$ . The figure shows that the minimum of the expected $\ell_{1}$ risk is close to that of its estimate.

In principle, one could iteratively minimize the $\ell_{1}$ distortion by starting with $\widehat{s}=x$ and successively refining it. Such an approach is given in Algorithm $1$ . An illustration of the denoising performance of the iterative algorithm is deferred to Section V.

IV-A Expected $\ell_{1}$ risk Using GMM Approximation

For the GMM p.d.f. in (10), the expected $\ell_{1}$ distortion evaluates to (cf. Appendix B for the derivation)

[TABLE]

where $\mu_{m}=-\displaystyle\frac{(a-1)s+\theta_{m}}{a\sigma_{m}}$ . The expected $\ell_{1}$ risk and its estimate for a multimodal (cf. Figure 3(b)) and Laplacian noise p.d.f.s are shown in Figures 11(b) and 11(c), respectively. We observe that, in both cases, the locations of the minima of the true risk and its estimate are in good agreement.

IV-B * Optimum Shrinkage $a_{\text{opt}}$ Versus Posterior SNR*

We next study the behavior of $a_{\text{opt}}$ for different input SNRs to compare the denoising capabilities of the MPE and the expected $\ell_{1}$ -distortion-based shrinkage estimators. The optimum pointwise shrinkage parameter $a_{\text{opt}}$ for Gaussian noise statistics, obtained by minimizing SURE, MPE risk estimate, and the estimated $\ell_{1}$ risk, for different values of the a posteriori SNR $\displaystyle\frac{x^{2}}{\sigma^{2}}$ is plotted in Figure 12(a). To illustrate the effect of $\epsilon$ , the variation of $a_{\text{opt}}$ versus a posteriori SNR for MPE corresponding to Gaussian noise is shown in Figure 12(b), for different $\epsilon$ . We observe that the shrinkage profiles are characteristic of a reasonable denoising algorithm, as Figures 12(a) and 12(b) exhibit that the shrinkage parameters increase as the a posteriori SNR increases. Whereas in case of the MPE, the choice of $\epsilon$ is crucial, the expected $\ell_{1}$ distortion does not require tuning such a parameter. Moreover, the MPE attenuation profile for larger values of $\epsilon$ is reminiscent of a hard-thresholding function, whereas the expected $\ell_{1}$ distortion has an attenuation profile that resembles a soft-threshold.

V Performance of the Expected $\ell_{1}$ Distortion-Based Pointwise Shrinkage Estimator

In a practical denoising application, we have only one noisy realization from which the clean signal has to be estimated. However, it is instructive to consider the case of multiple realizations as it throws some light on the performance comparisons vis-à-vis other estimators such as the ML estimator. Consider the observation model $\mathbb{x}^{(m)}=\mathbb{s}+\mathbb{w}^{(m)}$ in $\mathbb{R}^{n}$ , $1\leq m\leq M$ , where one has access to $M$ noisy copies of the signal $\mathbb{s}$ , and the noise vectors $\mathbb{w}^{(m)}$ are drawn independently from the $\mathcal{N}\left(\mathbb{0},\sigma^{2}I_{n}\right)$ distribution. The ML estimator of the $i^{\text{th}}$ signal coefficient $s_{i}$ is given by $\hat{s}_{\text{ML},i}=\frac{1}{M}\sum_{m=1}^{M}x_{i}^{(m)}$ , where $x_{i}^{(m)}$ is the $i^{\text{th}}$ component of $\mathbb{x}^{(m)}$ . Dropping the subscript $i$ , as each coefficient is treated independently of the others, the shrinkage estimator takes the form $\widehat{s}=a_{\text{opt}}\hat{s}_{\text{ML}}$ . To study the behavior of the estimate with respect to $M$ , we consider two variants: (i) where $a_{\text{opt}}$ is obtained by minimizing $\mathcal{R}_{\ell_{1}}\left(s,a\right)$ , referred to as the oracle- $\ell_{1}$ ; and (ii) where $a_{\text{opt}}$ is chosen to minimize $\mathcal{R}_{\ell_{1}}\left(\hat{s}_{\text{ML}},a\right)$ , referred to as ML- $\ell_{1}$ . The output SNR as a function of $M$ for the Piece-Regular signal, corresponding to an input SNR of $5$ dB, is shown in Figure 13(a). For all three estimators, namely, oracle- $\ell_{1}$ , ML- $\ell_{1}$ , and the ML estimate, the output SNR increases with $M$ . However, for the oracle- $\ell_{1}$ and the ML- $\ell_{1}$ estimators, the output SNR stagnates as $M$ increases beyond $40$ . For $M\leq 60$ , the oracle- $\ell_{1}$ and the ML- $\ell_{1}$ shrinkage estimators exhibit better performance compared with the ML estimator. As one would expect, the performance of the ML- $\ell_{1}$ estimator matches with that obtained using the oracle- $\ell_{1}$ as $M$ becomes large, because the ML estimate converges in probability to the true parameter. For $M=1$ , which is often the case in practice, the ML- $\ell_{1}$ estimate significantly dominates the ML estimator as seen in Figure 13(a). The SNR gain over the ML estimator could be further improved by using the iterative minimization algorithm introduced in Section IV (cf. Algorithm 1). The performance of the ML- $\ell_{1}$ and the ML estimators, for different values of $M$ and input SNR is shown in Figure 13(b). The figures show that for small values of SNR and $M$ , the ML- $\ell_{1}$ estimate outperforms the ML estimator. This is of significant importance in a practical setting where we have only one noisy realization ( $M=1$ ).

V-A Iterative Minimization of the Expected $\ell_{1}$ -Risk

When $M=1$ , the ML- $\ell_{1}$ estimator is obtained by minimizing $\mathcal{R}_{\ell_{1}}\left(x,a\right)$ , where $x$ is the noisy version of $s$ . We refer to this estimate as the non-iterative $\ell_{1}$ -based shrinkage estimator. Following Algorithm 1, one could iteratively refine the estimate, starting from $x$ . We compare the non-iterative $\ell_{1}$ -based estimator with its iterative counterpart, and present the results in Figures 14, 15, and 16, corresponding to Gaussian, multimodal (c.f. Figure 3(b)), and a GMM approximation to the Laplacian noise, respectively. The output SNR obtained using the oracle- $\ell_{1}$ estimator, calculated by minimizing $\mathcal{R}_{\ell_{1}}\left(s,a\right)$ , is also shown for benchmarking the performance.

We make the following observations from the Figures 14, 15, and 16: (i) the output SNR increases with iterations, albeit marginally after about 10 iterations; (ii) the iterative method consistently dominates the non-iterative one, with an overall SNR improvement of about $2$ to $3$ dB, for input SNR in the range $-5$ dB to $20$ dB; and (iii) the SNR gain of the iterative technique also reduces for higher input SNR, similar to other denoising algorithms.

VI Performance Assessment of MPE and $\ell_{1}$ -Risk Minimization Algorithms Versus State-Of-The-Art Denoising Algorithms

We compare the MPE and the $\ell_{1}$ -based shrinkage estimators with three state-of-the-art denoising algorithms: (i) wavelet soft-thresholding111A Matlab implementation is included in the Wavelab toolbox available at:

http://statweb.stanford.edu/~wavelab/. [34]; (ii) the SURE-LET denoising algorithm222A MATLAB implementation of the SURE-LET algorithm is available at:

http://bigwww.epfl.ch/demo/suredenoising. [5]; and (iii) smooth sigmoid shrinkage (SS) [13] in the wavelet domain 333Pastor et al. kindly provided the MATLAB implementation of their denoising technique [13], which facilitated the comparisons reported in this paper.. In [34], a wavelet-based soft-thresholding scheme is used for denoising, with the threshold selected as $\tau=\sigma\sqrt[]{2\log(N)}$ for an $N$ length signal. The SURE-LET technique employs a linear expansion of thresholds (LET), which is a linear combination of elementary denoising functions and optimizes for the coefficients by minimizing the SURE criterion. In [13], a smooth sigmoid shrinkage is applied on the wavelet coefficients to achieve denoising, and the parameters of the sigmoid, which control the degree of attenuation, are obtained by minimizing the SURE objective. We consider ECG signals taken from the PhysioBank database, and the HeaviSine and Piece-Regular signals taken from Wavelab toolbox for performance evaluation.

The noise is assumed to follow a Gaussian distribution and the output SNR values are averaged over $100$ independent realizations. The noise variance is estimated using a median-based estimator [37], which is also used by Luisier et al.1 and Donoho2. In SURE-LET, SS, and wavelet thresholding techniques, denoising is performed using Symmlet-4, with three levels of decomposition, as these settings were found to be the best for the ECG signal (following [22]). In case of MPE and $\ell_{1}$ -based shrinkage estimators, denoising is performed in the DCT domain. We use the shorthand notations MPE and MPE-subband to denote the pointwise and subband shrinkage estimators, respectively. The corresponding SURE-based subband shrinkage estimator is denoted as SURE-subband. We set $k=16$ and $\epsilon=1.75\sqrt{k}\sigma$ for computing the subband shrinkage parameters. These parameters have not been specifically optimized; however, they were found to work well in practice. The output SNR as a function of the input SNR, obtained using various algorithms, is shown in Figure 17.

From the ECG signal denoising performance shown in Figure 17(a), we observe that the MPE estimate consistently dominates the soft-thresholding-based denoising for input SNRs ranging from $-5$ dB to $20$ dB. The iterative $\ell_{1}$ -distortion-based shrinkage estimator (20 iterations) yields lower output SNR compared with the MPE-based estimate for input SNR values in the range $-5$ to $17.5$ dB, but surpasses it for relatively higher values of input SNR ( $17.5$ to $20$ dB). The SURE-LET and the SS algorithms dominate both MPE and the $\ell_{1}$ -based shrinkage estimators, because they use more sophisticated denoising functions in the transform domain, thereby offering greater flexibility. For input SNR range of [math] dB to $20$ dB, the expected $\ell_{1}$ -distortion-based shrinkage estimator consistently outperforms the soft-thresholding-based techniques.

We have also found that it is possible to boost the denoising performance of an algorithm in the low-SNR regime by adding the MPE denoiser in tandem, that is, by replacing $\tilde{s}$ in the expression for the MPE risk estimate in (4) with the estimate obtained using a denoising technique, for example, the SURE-LET. We refer to this tandem approach as MPE-SURE-LET in Figure 17. This approach results in $1$ to $2$ dB gain in output SNR over SURE-LET for low and medium values of input SNR. We observe in Figure 17 that the MPE-subband estimator outperforms the competing algorithms (except for MPE-SURE-LET in Figure 17(a)), at low input SNR.

VII Conclusions

We have proposed a new framework for signal denoising based on a novel criterion, namely the probability of error. Our framework is applicable to scenarios where the noise samples are independent and additively distort the signal. Denoising is performed by transform-domain shrinkage and the optimum shrinkage parameter is obtained by minimizing an estimate of the MPE risk. We have considered both pointwise and subband shrinkage estimators within the MPE paradigm. The performance of the proposed MPE estimators depends on the choice of the error-tolerance parameter $\epsilon$ . In pointwise shrinkage, to deal with the issue of selecting an appropriate $\epsilon$ , we have proposed two approaches. In the first one, we experimentally determined an $\epsilon$ value that results in maximum SNR gain for a particular signal by evaluating the output SNR for different $\epsilon$ . In the second approach, we computed the accumulated probability of error, which is the expected $\ell_{1}$ distortion, and developed an iterative algorithm for minimization. We demonstrated that the shrinkage estimator obtained using the expected $\ell_{1}$ risk outperforms the classical ML estimator, when the number of observations is small or the input SNR is low. We also showed that the shrinkage estimator obtained by iteratively minimizing the $\ell_{1}$ risk dominates the non-iterative approach in terms of the output SNR.

Extensive performance comparison of the proposed MPE and the $\ell_{1}$ distortion-based approaches with state-of-the-art denoising algorithms is carried out on real ECG signals and Wavelab signals. Experimental results demonstrate that the shrinkage estimator based on the MPE-risk estimate outperforms the SURE-based estimator in terms of SNR gain, particularly in the regime of low SNR and smaller subband size. The proposed MPE-framework could be used as an add-on over an existing denoising technique, leading to an estimator that has a higher output SNR, particularly in the low input SNR regime.

For deriving the expression and validating the performance of the MPE-based subband shrinkage estimator, we considered denoising of signals corrupted with Gaussian noise. Experimentally, we have found that increase in the subband size leads to an increase in output SNR, and saturates beyond a point. We have also observed that, when the subband size or the input SNR is low, the MPE-based estimate has superior performance compared with the SURE-based estimator.

We demonstrated that the optimum shrinkage parameter obtained by minimizing estimates of the MPE/ $\ell_{1}$ distortions increases monotonically with the increase in a posteriori SNR. Such behavior of the shrinkage parameter is essential for denoising. A theoretical characterization of this behavior is needed and may lead to interesting inferences, which could potentially lead to a rigorous convergence proof for the iterative expected $\ell_{1}$ distortion minimization technique. Another important observation is that, for lower input SNRs, the proposed denoising framework yields a higher output SNR compared with the MSE-based techniques. The improvement in performance in terms of SNR of the denoised output may be attributed to the fact that the MPE framework incorporates knowledge of the distribution of the observations, which goes beyond the second-order statistics considered in de facto MSE-based optimization. We also believe that this is the first attempt at demonstrating competitive denoising performance with probability of error chosen as the distortion metric, in a non-Bayesian estimation framework.

Appendix A Perturbation of SURE-Based Pointwise Shrinkage

To analyze the perturbation in the location of the minimum of the SURE cost function, in comparison with the true MSE, one needs to evaluate

[TABLE]

where $a_{\text{opt}}\left(s\right)=\frac{s^{2}}{s^{2}+\sigma^{2}}$ and $a_{\text{opt}}\left(x\right)=1-\frac{\sigma^{2}}{x^{2}}$ . Let

[TABLE]

The Taylor-series expansion of $h(x)$ about $s$ yields

[TABLE]

where $h^{\left(n\right)}$ is the $n^{\text{th}}$ derivative $h$ . Using the first-order Taylor series approximation $h(x)\approx h(s)+wh^{\left(1\right)}\left(s\right)$ , we obtain

[TABLE]

which, in turn, leads to an approximation of the perturbation probability $P_{e}^{\text{SURE}}$ :

[TABLE]

Invoking $w\sim\mathcal{N}\left(0,\sigma^{2}\right)$ , and using the Chernoff bound[43], we obtain

[TABLE]

Consequently, to satisfy an upper bound on the deviation probability of the form $P_{e}^{\text{SURE}}\leq\alpha$ , for a given $\delta>0$ , one must ensure that

[TABLE]

The condition in (19) translates to an equivalent condition on the minimum required SNR $\displaystyle{\frac{s^{2}}{\sigma^{2}}}$ to achieve a certain $P_{e}^{\text{SURE}}$ .

Appendix B Expected $\ell_{1}$ Risk for GMM

For additive noise with the p.d.f. given in (10), we have

[TABLE]

using (11) and (14). Letting $\mu_{m}=-\displaystyle\frac{(a-1)s+\theta_{m}}{a\sigma_{m}}$ and $u_{m}=\displaystyle\frac{\epsilon-(a-1)s-\theta_{m}}{a\sigma_{m}}$ , we get

[TABLE]

Subsequently, substituting (21) in (20) yields

[TABLE]

which is the expression for the expected $\ell_{1}$ distortion for noise following a GMM distribution.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. V. Poor, An Introduction to Signal Detection and Estimation , Springer, 1994.
2[2] S. Kay and Y. C. Eldar, “Rethinking biased estimation,” IEEE Signal Process. Mag. , vol. 25, no. 3, pp. 133–136, May 2008.
3[3] C. M. Stein, “Estimation of the mean of a multivariate normal distribution,” Ann. Stat. , vol. 9, no. 6, pp. 1135–1151, Nov. 1981.
4[4] W. James and C. Stein, “Estimation with quadratic loss,” Berkeley Symp. Math. Statist. and Prob. , Berkeley, CA, vol.1, pp. 361–379, 1961.
5[5] F. Luisier, T. Blu, and M. Unser, “A new SURE approach to image denoising: Interscale orthonormal wavelet thresholding,” IEEE Trans. Image Process. , vol. 16, no. 3, pp. 593–606, Mar. 2007.
6[6] F. Luisier, T. Blu, and M. Unser, “SURE-LET for orthonormal wavelet-domain video denoising,” IEEE Trans. Circ. Syst. Video Tech. , vol. 20, no. 6, pp. 913–919, Jun. 2010.
7[7] F. Luisier and T. Blu, “The SURE-LET multichannel image denoising: Interscale orthonormal wavelet thresholding,” IEEE Trans. Image Process. , vol. 17, no. 4, pp. 482 − - 492, Apr. 2008.
8[8] T. Blu and F. Luisier, “The SURE-LET approach to image denoising,” IEEE Trans. Image Process. , vol. 16, no. 11, pp. 2778–2786, Nov. 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Signal Denoising Using the Minimum-Probability-of-Error Criterion

Abstract

Index Terms:

I Introduction

I-A Prior Art

I-B This Paper

II The MPE Risk

II-A MPE Risk for Pointwise Shrinkage

II-A1 Closeness of R^\widehat{\mathcal{R}}R to R\mathcal{R}R

II-A2 Perturbation Probability of the location of minimum

II-A3 Unknown noise distributions

II-B MPE Risk for Subband Shrinkage

III Experimental Results for MPE-Based Denoising

III-A Performance of Pointwise-Shrinkage Estimator

III-A1 Harmonic signal denoising

III-A2 Piece-Regular signal denoising

III-A3 Effect of ϵ\epsilonϵ on the denoising performance of MPE

III-B Performance of Subband MPE Shrinkage

IV Accumulated Probability of Error: MPE Meets the Expected ℓ1\ell_{1}ℓ1​ Distortion

IV-A Expected ℓ1\ell_{1}ℓ1​ risk Using GMM Approximation

IV-B * Optimum Shrinkage aopta_{\text{opt}}aopt​ Versus Posterior SNR*

V Performance of the Expected ℓ1\ell_{1}ℓ1​ Distortion-Based Pointwise Shrinkage Estimator

V-A Iterative Minimization of the Expected ℓ1\ell_{1}ℓ1​-Risk

VI Performance Assessment of MPE and ℓ1\ell_{1}ℓ1​-Risk Minimization Algorithms Versus State-Of-The-Art Denoising Algorithms

VII Conclusions

Appendix A Perturbation of SURE-Based Pointwise Shrinkage

Appendix B Expected ℓ1\ell_{1}ℓ1​ Risk for GMM

II-A1 Closeness of $\widehat{\mathcal{R}}$ to $\mathcal{R}$

III-A3 Effect of $\epsilon$ on the denoising performance of MPE

IV Accumulated Probability of Error: MPE Meets the Expected $\ell_{1}$ Distortion

IV-A Expected $\ell_{1}$ risk Using GMM Approximation

IV-B * Optimum Shrinkage $a_{\text{opt}}$ Versus Posterior SNR*

V Performance of the Expected $\ell_{1}$ Distortion-Based Pointwise Shrinkage Estimator

V-A Iterative Minimization of the Expected $\ell_{1}$ -Risk

VI Performance Assessment of MPE and $\ell_{1}$ -Risk Minimization Algorithms Versus State-Of-The-Art Denoising Algorithms

Appendix B Expected $\ell_{1}$ Risk for GMM