Low-Complexity Blind Parameter Estimation in Wireless Systems with Noisy   Sparse Signals

Alexandra Gallyas-Sanhueza; Christoph Studer

arXiv:2302.14089·eess.SP·March 28, 2023·IEEE Trans. Wirel. Commun.

Low-Complexity Blind Parameter Estimation in Wireless Systems with Noisy Sparse Signals

Alexandra Gallyas-Sanhueza, Christoph Studer

PDF

Open Access 1 Repo

TL;DR

This paper introduces low-complexity blind estimators for noise power, signal power, SNR, and MSE in wireless systems with sparse signals, enabling improved parameter tracking and signal recovery without additional pilot overhead.

Contribution

It proposes novel blind estimators leveraging data sparsity, with theoretical analysis and practical applications in millimeter-wave and cell-free wireless systems.

Findings

01

Estimators accurately track system parameters in noisy, sparse data environments.

02

Application examples show improved channel estimation accuracy.

03

Estimators operate with low computational complexity.

Abstract

Baseband processing algorithms often require knowledge of the noise power, signal power, or signal-to-noise ratio (SNR). In practice, these parameters are typically unknown and must be estimated. Furthermore, the mean-square error (MSE) is a desirable metric to be minimized in a variety of estimation and signal recovery algorithms. However, the MSE cannot directly be used as it depends on the true signal that is generally unknown to the estimator. In this paper, we propose novel blind estimators for the average noise power, average receive signal power, SNR, and MSE. The proposed estimators can be computed at low complexity and solely rely on the large-dimensional and sparse nature of the processed data. Our estimators can be used (i) to quickly track some of the key system parameters while avoiding additional pilot overhead, (ii) to design low-complexity nonparametric algorithms that…

Tables1

Table 1. TABLE I: Complexity and Accuracy Summary. D 𝐷 D is the signal dimension and K ACC ≪ K BL much-less-than superscript 𝐾 ACC superscript 𝐾 BL K^{\text{ACC}}\ll K^{\text{BL}} refer to the number of iterations in the accelerated EM and baseline EM algorithms, respectively.

	Complexity		Accuracy
	Power estimation	Denoising	Synthetic data	Realistic channels
Baseline EM	$𝒪 (K^{BL} D)$	$𝒪 (K^{BL} D + D \log (D))$	(✓✓✓)	(✓✓)
Accelerated EM	$𝒪 (K^{ACC} D)$	$𝒪 (K^{ACC} D + D \log (D))$	(✓✓✓)	(✓✓)
Nonparametric	$𝒪 (D)$	$𝒪 (D \log (D))$	(✓)	(✓✓✓)
Parametric	$𝒪 (D)$	$𝒪 (D \log (D))$	(✓✓)	(✓✓✓)

Equations96

y = s + n,

y = s + n,

η (y) = s + e,

η (y) = s + e,

\displaystyle\overline{\mathsf{m}}(\mathbf{z})\triangleq\frac{1}{2}\Big{(}z^{\text{sort}}_{\lfloor(D+1)/2\rfloor}+z^{\text{sort}}_{\lceil(D+1)/2\rceil}\Big{)}.

\displaystyle\overline{\mathsf{m}}(\mathbf{z})\triangleq\frac{1}{2}\Big{(}z^{\text{sort}}_{\lfloor(D+1)/2\rfloor}+z^{\text{sort}}_{\lceil(D+1)/2\rceil}\Big{)}.

N_{0} ≜ \frac{m ( ∣ y ∣ ^{2} )}{lo g ( 2 )}

N_{0} ≜ \frac{m ( ∣ y ∣ ^{2} )}{lo g ( 2 )}

E_{s} ≜ [\frac{∥ y ∥ _{2}^{2}}{D} - N_{0}]_{+}

E_{s} ≜ [\frac{∥ y ∥ _{2}^{2}}{D} - N_{0}]_{+}

SNR ≜ [\frac{∥ y ∥ _{2}^{2}}{D N _{0}} - 1]_{+}

SNR ≜ [\frac{∥ y ∥ _{2}^{2}}{D N _{0}} - 1]_{+}

MSE ≜

MSE ≜

+ \frac{N _{0}}{D} d = 1 \sum D (\frac{\partial ℜ { η ( y _{d} )}}{\partial ℜ { y _{d} }} + \frac{\partial ℑ { η ( y _{d} )}}{\partial ℑ { y _{d} }})

N_{0} (\overset{p}{^}) ≜ \frac{1}{2} N_{0}

N_{0} (\overset{p}{^}) ≜ \frac{1}{2} N_{0}

⎩ ⎨ ⎧ \frac{l a l a}{( \frac{l a l a p ^}{l a l a p ^} )} ⎭ ⎬ ⎫ + (1 - \overset{p}{^}) + \frac{p ^ ^{2}}{p ^ + SNR}

\overset{p}{^} (q, r) ≜ \frac{1}{D} (\frac{∥ y ∥ _{q}}{∥ y ∥ _{r}})^{\frac{1}{1/ q - 1/ r}}

\overset{p}{^} (q, r) ≜ \frac{1}{D} (\frac{∥ y ∥ _{q}}{∥ y ∥ _{r}})^{\frac{1}{1/ q - 1/ r}}

F_{X} (m_{X}) = \frac{1}{2} .

F_{X} (m_{X}) = \frac{1}{2} .

D \to \infty lim Pr [∣ \overline{m} (x) - m_{X} ∣ \geq c] = 0.

D \to \infty lim Pr [∣ \overline{m} (x) - m_{X} ∣ \geq c] = 0.

f_{S} (s_{d}) ≜ (1 - p) δ (s_{d}) + p \frac{1}{π E _{s} / p} e^{- \frac{∣ s _{d} ∣ ^{2}}{E _{s} / p}},

f_{S} (s_{d}) ≜ (1 - p) δ (s_{d}) + p \frac{1}{π E _{s} / p} e^{- \frac{∣ s _{d} ∣ ^{2}}{E _{s} / p}},

\displaystyle f_{Y}(y_{d})\triangleq\

\displaystyle f_{Y}(y_{d})\triangleq\

+ p \frac{1}{π ( N _{0} + E _{s} / p )} e^{- \frac{∣ y _{d} ∣ ^{2}}{N _{0} + E _{s} / p}} .

p \leq p^{max} with p^{max} ≜ \frac{e ^{2} - 2}{2 e ^{2} - 2} \approx 0.421.

p \leq p^{max} with p^{max} ≜ \frac{e ^{2} - 2}{2 e ^{2} - 2} \approx 0.421.

≜ \frac{m _{Z}}{min { lo g ( \frac{2 - 2 p}{1 - 2 p} ) , lo g ( 2 ) ( 1 + SNR ) }}

≜ \frac{m _{Z}}{min { lo g ( \frac{2 - 2 p}{1 - 2 p} ) , lo g ( 2 ) ( 1 + SNR ) }}

≜ \frac{m _{Z}}{lo g ( 2 )} ((1 - p) + \frac{p ^{2}}{p + SNR}) .

LB \leq N_{0} \leq UB \leq D \to \infty lim N_{0} .

LB \leq N_{0} \leq UB \leq D \to \infty lim N_{0} .

\frac{1}{1/ SNR + 1/ p + 1} \leq D \to \infty lim ε \leq min {lo g (\frac{1 - p}{1 - 2 p}), SNR} .

\frac{1}{1/ SNR + 1/ p + 1} \leq D \to \infty lim ε \leq min {lo g (\frac{1 - p}{1 - 2 p}), SNR} .

\displaystyle\frac{1}{D}\|\mathbf{y}\|_{2}^{2}-N_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$\scriptstyle{D\to\infty}$}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$\scriptstyle{a.s.}$}}}E_{s}.

\displaystyle\frac{1}{D}\|\mathbf{y}\|_{2}^{2}-N_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$\scriptstyle{D\to\infty}$}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$\scriptstyle{a.s.}$}}}E_{s}.

SURE ≜

SURE ≜

+ \frac{N _{0}}{D} d = 1 \sum D (\frac{\partial ℜ { η ( y _{d} )}}{\partial ℜ { y _{d} }} + \frac{\partial ℑ { η ( y _{d} )}}{\partial ℑ { y _{d} }})

∥ s ∥_{q} \leq ∥ s ∥_{0}^{1/ q - 1/ r} ∥ s ∥_{r}, 1 \leq q < r .

∥ s ∥_{q} \leq ∥ s ∥_{0}^{1/ q - 1/ r} ∥ s ∥_{r}, 1 \leq q < r .

\frac{1}{D} (\frac{∥ s ∥ _{q}}{∥ s ∥ _{r}})^{\frac{1}{1/ q - 1/ r}} \leq \frac{∥ s ∥ _{0}}{D} \approx p .

\frac{1}{D} (\frac{∥ s ∥ _{q}}{∥ s ∥ _{r}})^{\frac{1}{1/ q - 1/ r}} \leq \frac{∥ s ∥ _{0}}{D} \approx p .

\displaystyle\eta(x;\tau)\triangleq\left\{\begin{array}[]{ll}\frac{x}{|x|}\max\{|x|-\tau,0\}&x\neq 0\\ 0&x=0,\end{array}\right.

\displaystyle\eta(x;\tau)\triangleq\left\{\begin{array}[]{ll}\frac{x}{|x|}\max\{|x|-\tau,0\}&x\neq 0\\ 0&x=0,\end{array}\right.

H^{ML} = H + N^{CE},

H^{ML} = H + N^{CE},

H^{ML} = H + N^{CE} .

H^{ML} = H + N^{CE} .

H^{1-bit ML} = Q (H + N^{CE}) .

H^{1-bit ML} = Q (H + N^{CE}) .

H^{1-bit ML} = F Q (H + N^{CE})

H^{1-bit ML} = F Q (H + N^{CE})

H^{ML} = H + N^{CE} .

H^{ML} = H + N^{CE} .

F_{Z} (z_{d}) ≜ (1 - p) (1 - e^{- \frac{z _{d}}{N _{0}}}) + p (1 - e^{- \frac{z _{d}}{N _{0} + E _{s} / p}}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iip-group/blind_and_nonparametric_estimators
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Advanced Adaptive Filtering Techniques · Direction-of-Arrival Estimation Techniques

Full text

\frefformat

vario\fancyrefseclabelprefixSection #1 \frefformatvario\fancyreffiglabelprefixFigure #1 \frefformatvariothmTheorem #1 \frefformatvariocorCorollary #1 \frefformatvarioremRemark #1 \frefformatvariolemLemma #1 \frefformatvarioappAppendix #1 \frefformatvariodefDefinition #1 \frefformatvarioalgAlgorithm #1 \frefformatvariotblTable #1 \frefformatvarioestEstimator #1 \frefformatvariosysSystem Model #1 \frefformatvario\fancyrefeqlabelprefix(#1)

Low-Complexity Blind Parameter Estimation

in Wireless Systems with Noisy Sparse Signals

Alexandra Gallyas-Sanhueza and Christoph Studer A. Gallyas-Sanhueza is with the School of Electrical and Computer Engineering, Cornell University, Ithaca, NY; email: [email protected]. Studer is with the Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland; email: [email protected] work of AGS and CS was supported in part by ComSenTer, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, and in part by the US National Science Foundation (NSF) under grants CNS-1717559 and ECCS-1824379.Part of this work was presented at the IEEE International Conference on Communications (ICC) 2021 [1]. This journal paper extends our work by (i) including a novel parametric noise power estimator with improved accuracy, (ii) evaluating the proposed blind estimators as an initializer for an expectation-maximization algorithm, and (iii) adding two applications examples.MATLAB code to reproduce our simulations is available on GitHub: https://github.com/IIP-Group/blind_and_nonparametric_estimators.The authors thank Arian Maleki, Ramina Ghods, Charles Jeon, and Seyed Hadi Mirfarshbafan for discussions on signal recovery using SURE, and Haochuan Song for sharing the cell-free system simulator from [2].

Abstract

Baseband processing algorithms often require knowledge of the noise power, signal power, or signal-to-noise ratio (SNR). In practice, these parameters are typically unknown and must be estimated. Furthermore, the mean-square error (MSE) is a desirable metric to be minimized in a variety of estimation and signal recovery algorithms. However, the MSE cannot directly be used as it depends on the true signal that is generally unknown to the estimator. In this paper, we propose novel blind estimators for the average noise power, average receive signal power, SNR, and MSE. The proposed estimators can be computed at low complexity and solely rely on the large-dimensional and sparse nature of the processed data. Our estimators can be used (i) to quickly track some of the key system parameters while avoiding additional pilot overhead, (ii) to design low-complexity nonparametric algorithms that require such quantities, and (iii) to accelerate more sophisticated estimation or recovery algorithms. We conduct a theoretical analysis of the proposed estimators for a Bernoulli complex Gaussian (BCG) prior, and we demonstrate their efficacy via synthetic experiments. We also provide three application examples that deviate from the BCG prior in millimeter-wave multi-antenna and cell-free wireless systems for which we develop nonparametric denoising algorithms that improve channel-estimation accuracy with a performance comparable to denoisers that assume perfect knowledge of the system parameters.

I Introduction

Accurate knowledge of system parameters, such as the average noise power, average signal power, and/or signal-to-noise ratio (SNR), is critical in wireless communication systems, as many baseband processing tasks rely on these quantities [3]. Virtually all existing wireless systems dedicate training phases to estimate such parameters. These training phases typically consist of sending pilots: signals that are known to the receiver and enable estimation of the desired parameters. As pilots do not convey information, minimizing the pilot overhead is desirable in practice. Furthermore, parameter estimation in wireless systems operating at millimeter-wave (mmWave) frequencies must be done frequently, since the propagation conditions can change at fast rates, e.g., blockers or interferers may appear or disappear quickly [4]. Thus, it is even more important to reduce the pilot overhead. In addition, such systems are expected to support several GHz of bandwidth and basestations will consist of a large number of antenna elements. It is therefore important to develop low-complexity solutions that quickly and accurately track such parameters for high-dimensional problems that must be processed at fast rates.

From a parameter estimation perspective, it is beneficial that many modern wireless communication systems often deal with high-dimensional data. For example, all-digital massive multiple-input multiple-output (MIMO) basestations are expected to be equipped with hundreds of antennas [5] or orthogonal frequency-division multiplexing (OFDM) systems will support thousands of subcarriers [6]. Since many of these high-dimensional signals arising in such systems exhibit structure (e.g., are sparse or are taken from a discrete set), one can design statistical methods that blindly estimate critical parameters without requiring a dedicated training phase.

In this paper, we focus on noisy observations of signal vectors that are sparse, i.e., only few entries carry most of the signals’ energy. Examples of sparse vectors in wireless systems include (i) the beamspace-domain representation of all-digital mmWave multi-antenna channel vectors [7, 8, 9], (ii) the delay-domain representation of OFDM channel vectors [10], and (iii) the antenna-domain representation of channel vectors in cell-free MIMO wireless systems [11]. We will explain how sparsity can be exploited to estimate parameters and denoise noisy observations of sparse vectors. In Sections II, III, and IV, we decouple our results from wireless communication applications and study the general setting. In \frefsec:channel_denoising, we apply our estimators and algorithms to three distinct applications in wireless systems.

In what follows, we will use the term “blind” for estimators that do not use any pilots or training sequences and instead rely only on the signal statistics; blind estimators may have tuning parameters. We will use the term “nonparametric” for estimators that do not need knowledge of system parameters and do not have parameters that need to be tuned manually; nonparametric estimators may use pilots or training sequences.

I-A Prior Art in Blind and Nonparametric Estimation

Many of the existing blind noise power and SNR estimators exploit modulation-specific structure, such as the cyclic prefix redundancy in OFDM [12, 13], or the periodicity of synchronization sequences [14]. Expectation-maximization (EM) has also been used for blind noise power or SNR estimation [15], and for joint sparse signal recovery and noise power estimation [16, 17]. However, the iterative nature of Bayesian algorithms and EM, and their relatively high per-iteration complexity renders such methods unsuitable for real-time estimation in wireless systems that operate with high-dimensional data at gigabit-per-second sampling rates. In contrast, we propose low-complexity blind estimators whose complexity only scales with $\mathcal{O}(D)$ , where $D$ is the dimension of the processed data. Our proposed low-complexity estimators can also be used as an initialization point to accelerate the convergence of existing EM algorithms.

Joint noise power estimation and sparse signal recovery was investigated in [18]; these methods require the choice of algorithm parameters, which affect the estimation accuracy and robustness. A parameter-free version of sparse signal recovery that combines approximate message passing (AMP) [19, 20] with Stein’s unbiased risk estimate (SURE) [21, 22] was proposed in [23]. Similarly, the nonparametric equalizer (NOPE) [24] combines AMP with SURE to perform linear minimum mean-square error (MSE) equalization in massive MIMO systems without knowledge of the SNR. A drawback of such algorithms is the high per-iteration complexity, which prevents their use in wireless systems supporting large bandwidths and high-dimensional problems (see, e.g., [25, 26] for hardware results of sparse signal recovery). We therefore focus on low complexity, blind, and nonparametric algorithms for the fully-determined setting (in contrast to compressive sensing where one has fewer measurements than unknowns), which finds use in many practical situations. For example, all-digital massive MIMO architectures (which can be as energy efficient as hybrid analog-digital architectures [27, 28, 29]) and cell-free wireless systems can provide measurement vectors of the same dimension as the sparse signal. In OFDM systems, even though pilots are typically transmitted only on a subset of all subcarriers, interpolation and extrapolation algorithms can be used to extract channel state information on all subcarriers [30]; this also leads to the fully-determined setting that enables the use of our methods.

In this low-complexity setting, the concept of estimating tuning parameters directly from the noisy observations has been used recently for adaptive denoising of mmWave [7, 8, 9] or OFDM [10] channel vectors. Such denoising algorithms typically require a tuning parameter: the denoising threshold. While SURE can be used to automatically determine the MSE-optimal denoising threshold, it still requires knowledge of the noise power. In contrast to such results, we propose low-complexity blind estimators, which enable the design of nonparametric (i.e., parameter free) channel-vector denoising algorithms that deliver comparable performance to methods that assume perfect knowledge of the required parameters (e.g., the noise power).

Blind nonparametric algorithms have been proposed for denoising of real-valued signals. The authors in [31] have used power estimation methods based on the median absolute deviation (MAD) of real-valued signals for wavelet denoising. The Python wavelet toolbox PyYAWT [32] includes MAD-based power estimation and adaptive wavelet denoising using SURE for real-valued signals. Our methods also build upon MAD and SURE, but are suitable for complex-valued signals. In addition, we provide a detailed derivation and a theoretical analysis, and extend the general concept to estimate other quantities that frequently arise in wireless systems. While some papers apply real-valued MAD for noise power estimation in the complex-valued setting (see, e.g., [33] for magnetic resonance imaging), there are non-negligible differences to the complex case. We therefore derive the complex-valued version, provide a theoretical accuracy analysis with a Bernoulli complex Gaussian (BCG) prior, and show application examples that deviate from this prior in order to highlight robustness and usefulness of our results.

I-B Contributions

A variety of applications in communication systems deal with sparse and complex-valued signals whose observations are contaminated with noise. For such a model, we propose novel low-complexity blind estimators for the average noise power, average signal power, and SNR. In addition, we propose a blind estimator for the MSE of an estimation function that aims to recover the sparse signal. We use this blind MSE estimate to design a novel nonparametric channel-vector denoising algorithm. We conduct a theoretical analysis of our estimators for a BCG prior, and we showcase simulation results with synthetic data in order to demonstrate the efficacy and limits of our estimators in finite dimensions. In order to demonstrate the efficacy of our results in situations that deviate from a BCG prior, we provide three application examples of channel-vector denoising in mmWave and cell-free communication systems. We also show that our low-complexity estimators can be used to accelerate the convergence (and, hence, reduce the complexity) of existing estimators with a concrete example of an EM-based algorithm.

I-C Notation

Lowercase and uppercase boldface letters denote column vectors and matrices, respectively. The $d$ th entry of the vector $\mathbf{a}\in\mathbb{C}^{D}$ is $a_{d}$ ; the real and imaginary parts are $\Re\{\mathbf{a}\}$ and $\Im\{\mathbf{a}\}$ , respectively. We use $\mathbf{b}\triangleq|\mathbf{a}|^{2}$ to refer to $b_{d}=|a_{d}|^{2}$ for $d=1,\ldots,D$ . For $\mathbf{a}\in\mathbb{C}^{D}$ , the vector $q$ -norm is defined as $\|\mathbf{a}\|_{q}\triangleq\left(\sum_{d=1}^{D}{|a_{d}|^{q}}\right)^{1/q}$ for $q\geq 1$ with $\|a\|_{\infty}\triangleq\max_{d=1,\ldots,D}|a_{d}|$ and the $\ell_{0}$ -pseudo-norm $\|\mathbf{a}\|_{0}$ counts the number of nonzero entries in $\mathbf{a}$ . The identity matrix is $\mathbf{I}$ and the all-zeros vector is $\mathbf{0}$ . The discrete Fourier transform matrix is denoted by $\mathbf{F}$ and satisfies $\mathbf{F}^{\textnormal{H}}\mathbf{F}=\mathbf{I}$ , where the superscript ${}^{\textnormal{H}}$ denotes the Hermitian (conjugate transpose) matrix. An i.i.d. circularly-symmetric complex Gaussian random vector $\mathbf{x}\in\mathbb{C}^{D}$ with variance $E_{x}$ per complex dimension is denoted by $\mathbf{x}\sim\mathcal{C}\mathcal{N}(\mathbf{0},E_{x}\mathbf{I})$ and its probability density function (PDF) evaluated at $\mathbf{x}$ is $f^{\mathcal{CN}}(\mathbf{x};\mathbf{0},E_{x}\mathbf{I})$ . Sample estimates are denoted by a bar, e.g., the sample variance $\overline{E}_{x}\triangleq\frac{1}{D}\|\mathbf{x}\|_{2}^{2}$ of the random vector $\mathbf{x}\in\mathbb{C}^{D}$ ; statistical quantities are denoted by plain symbols, e.g., the variance $E_{x}\triangleq\frac{1}{D}\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{x}\|_{2}^{2}\right]$ , where $\operatorname{\mathbb{E}}\mathopen{}\left[\cdot\right]$ denotes expectation; blind estimators are denoted by a hat, e.g., $\widehat{E}_{x}$ . For $x\in\mathbb{R}$ , rounding towards plus and minus infinity is denoted by $\lceil x\rceil$ and $\lfloor x\rfloor$ , respectively, and $[x]_{+}\triangleq\max\{x,0\}$ . Convergence in probability of a random sequence $A_{n}$ to a random variable $A$ is $A_{n}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{n\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{prob.} $}}}A$ and almost sure convergence is $A_{n}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{n\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{a.s.} $}}}A$ .

II Practical Guide to Low-Complexity Blind Estimators

We now introduce two system models and propose low-complexity blind estimators for the average noise and signal powers, SNR, and MSE. The derivation of the proposed estimators and an analysis of the key properties are provided in \frefsec:theory.

II-A System Models

We say that a complex-valued vector $\mathbf{s}\in\mathbb{C}^{D}$ is sparse if the number of nonzero entries is smaller than the dimension $D$ . As a sparsity measure, one can use, for example, the $\ell_{0}$ -pseudo-norm $\|\mathbf{s}\|_{0}$ . This definition of sparsity allows us to derive theoretical results, but in practice, our algorithms also work for approximately sparse signals in which most entries are small compared to the noise (but not necessarily zero). We will focus on the following two system models.

System Model 1.

Let $\mathbf{s}\in\mathbb{C}^{D}$ be a sparse signal with average power $E_{s}\triangleq\frac{1}{D}\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{2}^{2}\right]$ . We model the input-output relation of a noisy observation of the sparse signal as

[TABLE]

where $\mathbf{y}\in\mathbb{C}^{D}$ is the noisy observation and $\mathbf{n}\in\mathbb{C}^{D}$ models noise with $\mathbf{n}\sim\mathcal{CN}(\mathbf{0},N_{0}\mathbf{I})$ . We assume that the sparse signal vector $\mathbf{s}$ and noise vector $\mathbf{n}$ are statistically independent.

\fref

sys:systemmodel1 finds numerous applications in wireless communication systems. Prime examples are in describing estimated channel vectors (i) in multi-antenna mmWave systems, where the beamspace-domain representation of the channel vectors is typically sparse [7, 8, 9], (ii) in OFDM systems, where the delay-domain representation of the channel vectors is typically sparse [10], or (iii) in cell-free communication systems with centralized processing, where the antenna-domain representation of the channel vectors is typically sparse [11]. In what follows, we assume the sparse vector $\mathbf{s}$ is unknown (in contrast to pilot-based estimation), which makes parameter estimation nontrivial in this blind scenario.

System Model 2.

Let $\mathbf{y}\in\mathbb{C}^{D}$ be a noisy observation as in \frefsys:systemmodel1. Fix a weakly differentiable function111A weakly differentiable function may be nondifferentiable only in zero-measure sets (e.g., for particular values), and has to be differentiable everywhere else. ${\eta:\mathbb{C}\to\mathbb{C}}$ that operates entry-wise on vectors. We model the output after applying this function to the noisy observation as

[TABLE]

where $\mathbf{e}\in\mathbb{C}^{D}$ contains (likely non-Gaussian) residual distortion. We emphasize that the sparse signal vector $\mathbf{s}$ and the residual distortion vector $\mathbf{e}$ are not necessarily statistically independent.

\fref

sys:systemmodel2 is relevant in the following scenarios: (i) Estimating a sparse signal $\mathbf{s}$ from a noisy observation $\mathbf{y}$ by applying an entry-wise denoising or estimation function, producing the signal estimate $\hat{}\mathbf{s}\triangleq\eta(\mathbf{y})$ ; this scenario finds use for channel-vector denoising [7, 8, 9]. (ii) Modeling nonlinearities caused by hardware impairments [34], in which case the distorted version of the noisy received signal can be expressed as $\mathbf{r}\triangleq\eta(\mathbf{y})$ ; this scenario finds use in signals sampled with low-resolution data converters [35, 36], for example.

II-B Low-Complexity Blind Nonparametric Estimators

In what follows, we make use of the sample median, which we define as follows.

Definition 1 (Sample Median).

Let $\mathbf{z}\in\mathbb{R}^{D}$ be a real-valued vector and $\mathbf{z}^{\text{sort}}\in\mathbb{R}^{D}$ be its sorted version (entries sorted in ascending order). Then, the sample median is defined as

[TABLE]

The sample median is robust to outliers [37, 38], which makes it amenable to \frefsys:systemmodel1, as the nonzero entries of the sparse vector $\mathbf{s}$ can be considered to be outliers for the purpose of separating the sparse signal from noise. We emphasize that the sample median can be computed at a complexity of $\mathcal{O}(D)$ average time using quickselect [39] or of $\mathcal{O}(D)$ deterministic time using the MedianOfNinthers algorithm [40].

We now propose a range of low-complexity blind estimators (no pilots required) for complex-valued signals that require no parameters.

Estimator 1 (Average Noise Power).

Consider \frefsys:systemmodel1. We propose the following blind estimator

[TABLE]

for the average noise power defined as $N_{0}\triangleq\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{n}\|_{2}^{2}\right]/D$ .

\fref

est:noisevariance is blind as it only requires the absolute square entries of the noisy observation $\mathbf{y}$ in \frefeq:inputoutputrelation. The estimate $\widehat{N}_{0}$ can be computed efficiently in $\mathcal{O}(D)$ time, since the most complex operation is computing the median of a vector of dimension $D$ . \frefest:noisevariance exploits sparsity in the signal $\mathbf{s}$ , but is independent of the signal sparsity, the signal power, or the statistical sparsity model. It is, however, important to understand that the accuracy of this estimator depends on all of these factors as it relies on the fact that the nonzero entries of the sparse vector $\mathbf{s}$ can be treated as outliers for the purpose of estimating the average noise power. We note that this noise power estimator can be seen as a complex-valued and squared version222The squared median absolute deviation (MAD) estimator for real-valued signals provided in [38] corresponds to $\overline{\mathsf{m}}(|\mathbf{y}|)^{2}$ whereas we propose to use $\overline{\mathsf{m}}(|\mathbf{y}|^{2})$ . While $\overline{\mathsf{m}}(|\mathbf{y}|)^{2}\leq\overline{\mathsf{m}}(|\mathbf{y}|^{2})$ if $D$ is even, both estimators coincide if $D$ is odd. What is more, our scaling factor $\log(2)\approx 0.6931$ differs considerably from the widely-used scaling factor of $(\Phi^{-1}(3/4))^{2}\approx(0.6745)^{2}$ for real-valued signals [31]. We reiterate that the latter is derived for power estimation of real-valued Gaussians using the MAD estimator, while in our derivation we consider the case of complex-valued Gaussians. of the median absolute deviation (MAD) estimator [37, 41], where we use the assumption that the noise in \frefsys:systemmodel1 is zero mean. The intuition behind this estimator (and the $\log(2)$ factor) is the fact that the entries ${|n_{d}|^{2}}/{(N_{0}/2)}$ , $d=1,\ldots,D$ are $\chi^{2}$ distributed with two degrees of freedom, which have a median of $2\log(2)$ , and that the median of $|\mathbf{y}|^{2}$ is not significantly “contaminated” by the sparse signal. \frefest:noisevariance is used in the estimators proposed next.

Estimator 2 (Average Signal Power).

Consider \frefsys:systemmodel1. We propose the following blind estimator

[TABLE]

for the average signal power defined as $E_{s}\triangleq\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{2}^{2}\right]/D$ .

\fref

est:signalpower is blind as it only requires the sample estimate of the receive power $\overline{E}_{y}\triangleq\|\mathbf{y}\|_{2}^{2}/D$ and the blind noise estimate $\widehat{N}_{0}$ from \frefest:noisevariance. $\widehat{E}_{s}$ can be computed efficiently in $\mathcal{O}(D)$ time, since the most complex operation is computing $\widehat{N}_{0}$ . The intuition behind this estimator comes from subtracting the estimated noise power from the total receive power, as done previously in [13] for an OFDM-specific estimator.

Estimator 3 (Signal-to-Noise Ratio).

Consider \frefsys:systemmodel1. We propose the following blind estimator

[TABLE]

for the SNR defined as $\textit{SNR}\triangleq{\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{2}^{2}\right]}/{\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{n}\|_{2}^{2}\right]}$ .

\fref

est:SNR is blind as it only requires the sample estimate of the receive power $\overline{E}_{y}\triangleq\|\mathbf{y}\|_{2}^{2}/D$ and the blind estimate $\widehat{N}_{0}$ from \frefest:noisevariance. $\widehat{\textit{SNR}}$ can also be computed efficiently in $\mathcal{O}(D)$ time. The intuition behind this estimator comes from dividing the estimated signal and noise powers, as done previously in [13] for an OFDM-specific estimator.

Estimator 4 (Mean-Square Error).

Consider \frefsys:systemmodel2 with a fixed function $\eta:\mathbb{C}\to\mathbb{C}$ . We propose the following blind estimator

[TABLE]

for the MSE defined as $\textit{MSE}\triangleq\operatorname{\mathbb{E}}\mathopen{}\left[\|\eta(\mathbf{y})-\mathbf{s}\|_{2}^{2}\right]/D=\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{e}\|_{2}^{2}\right]/D$ .

\fref

est:MSE is blind as it only requires the receive signal $\mathbf{y}$ , the blind estimate $\widehat{N}_{0}$ from \frefest:noisevariance, and the function $\eta$ . The complexity of the proposed MSE estimator depends on the function $\eta$ . For example, \frefeq:MSEnonparametricexplicitform can be computed efficiently in $\mathcal{O}(D)$ time for the soft-thresholding function with a given threshold. Even if the threshold is not given, searching for the best threshold and applying the soft-thresholding function can be done in $\mathcal{O}(D\log(D))$ time using the methods developed for the BEACHES algorithm in [7]. The MSE is a frequently used metric to evaluate the performance of estimation algorithms. Our blind MSE estimate, since it is independent of $\mathbf{s}$ , can be used to automatically tune parameters in estimators. The intuition behind this estimator relies on SURE, and we refer the interested reader to [22] for an accessible derivation in the real-valued case and to [7, 8] for a derivation in the complex-valued case. \frefest:MSE is used to obtain the nonparametric channel-vector denoising algorithm described in \frefsec:channel_denoising.

II-C Low-Complexity Blind Parametric Estimators

We now propose a low-complexity blind estimator (no pilots or training sequences required) that takes an estimate $\hat{p}$ of the activity rate as a parameter. We then propose a family of parametric estimators for the activity rate.

Estimator 5 (Average Noise Power with Estimated SNR and Activity Rate Corrections).

Consider \frefsys:systemmodel1, the low-complexity blind estimates $\widehat{\textit{SNR}}$ from \frefest:SNR, and a parameter $\hat{p}$ that is an estimate of the activity rate $p$ . We propose the following blind parametric estimator

[TABLE]

for the average noise power $N_{0}$ .

\fref

est:sandwich is blind as it only requires the blind estimates $\widehat{N}_{0}$ , $\widehat{\textit{SNR}}$ , but is parametric as it depends on the activity rate estimate $\hat{p}$ . $\widehat{N}_{0}(\hat{p})$ can be computed efficiently in $\mathcal{O}(D)$ time, since the most complex operations are computing $\widehat{N}_{0}$ and $\widehat{\textit{SNR}}$ , and eventually $\hat{p}$ (but here we consider $\hat{p}$ as a given parameter and ignore the complexity associated with obtaining it). The intuition behind this estimator will become clear after we present \frefthm:mainresult, as it is derived from averaging a lower and an upper bound on $N_{0}$ . As shown in \frefsec:synthetic_results, this parametric estimate $\widehat{N}_{0}(\hat{p})$ often yields better accuracy than the nonparametric estimate $\widehat{N}_{0}$ .

Since in some applications an estimate for $\hat{p}$ may be unavailable, we next propose a family of estimators that attempt to extract the activity rate $p$ directly from the noisy observation vector $\mathbf{y}$ . Such estimators can, for example, be used to substitute $\hat{p}$ in \frefeq:sandwich_estimator.

Estimator 6 (Activity Rate).

Consider \frefsys:systemmodel1 and integers $1\leq q<r$ . We propose the following family of blind parametric estimators333In practice, we use $\min\{0.499,\hat{p}(q,r)\}$ in place of $\hat{p}(q,r)$ , so that $\log\!\Big{(}\frac{2-2\hat{p}(q,r)}{1-2\hat{p}(q,r)}\Big{)}$ is always well-defined, as required by \frefeq:probabilitycondition.

[TABLE]

for the activity rate444We define the activity rate as the fraction of nonzero entries of the vector $\mathbf{s}$ . Values of $p$ close to [math] indicate the vector is sparse and $p=1$ indicates that all entries are nonzero. defined as $p\triangleq\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{0}\right]/D$ .

\fref

est:p is blind as it only requires the receive vector $\mathbf{y}$ , but is parametric as it requires a choice for $q$ and $r$ . $\hat{p}(q,r)$ can be computed efficiently in $\mathcal{O}(D)$ time and, among others, the following choices for $q$ and $r$ require low complexity: $\hat{p}{(1,2)}\triangleq\frac{1}{D}\left(\frac{\|\mathbf{y}\|_{1}}{\|\mathbf{y}\|_{2}}\right)^{2}$ , $\hat{p}{(1,\infty)}\triangleq\frac{1}{D}\left(\frac{\|\mathbf{y}\|_{1}}{\|\mathbf{y}\|_{\infty}}\right)$ , $\hat{p}{(2,4)}\triangleq\frac{1}{D}\left(\frac{\|\mathbf{y}\|_{2}}{\|\mathbf{y}\|_{4}}\right)^{4}$ , and $\hat{p}{(2,\infty)}\triangleq\frac{1}{D}\left(\frac{\|\mathbf{y}\|_{2}}{\|\mathbf{y}\|_{\infty}}\right)^{2}$ . The parameters $q$ and $r$ must be chosen according to simulations, as we are unaware of a principled and reliable way to determine them. In our simulations, the choice $\hat{p}(1,\infty)$ performed best.

II-D Blind Parametric Estimator Based on Expectation-Maximization (EM)

As a baseline, we also consider a blind EM estimator (no pilots required) that requires initialization values and algorithm parameters that determine the convergence criterion.

Estimator 7 (Noise Power, Signal Power, and Activity Rate).

Consider \frefsys:systemmodel1. \frefalg:EM, initialized with ${N_{0}^{\text{init}}<\|\mathbf{y}\|_{2}^{2}/D}$ and $p^{\text{init}}<0.5$ , simultaneously estimates the noise power $N_{0}$ , the signal power $E_{s}$ , and the activity rate $p$ .

\fref

est:EM is blind as it only requires the noisy observation $\mathbf{y}$ , but is parametric as it needs a choice for the maximum number of iterations $K^{\text{max}}$ , the tolerance $\xi$ , and initialization values for the noise power $N_{0}^{\text{init}}$ and activity rate $p^{\text{init}}$ . The total number of EM iterations $K$ is not fixed but depends on $K^{\text{max}}$ , $\xi$ , $N_{0}^{\text{init}}$ , $p^{\text{init}}$ , and on the input $\mathbf{y}$ itself. The complexity of \frefest:EM is $\mathcal{O}(KD)$ . We note that this estimator is a variant of a classical EM algorithm for a two-component Gaussian mixture [42], where we use the assumption that the signal and the noise in \frefsys:systemmodel1 are zero mean and complex valued. The intuition behind this estimator is the fact that each entry $|y_{d}|$ , $d=1,\ldots,D$ , of vector $\mathbf{y}$ contains either noise or signal-plus-noise, and those two cases have Gaussian distribution with different variances.

We note that this baseline EM algorithm is only a minor variation of the method in [42, Alg. 8.1]. The iterative nature of such methods, however, results in (often significantly) higher complexity than our estimators. With this in mind, we propose an improved version that we call “accelerated EM,” which simply consists of initializing the baseline EM algorithm using our blind nonparametric noise variance estimator. As we will see in \frefsec:accelerated_convergence, this accelerated EM variant drastically reduces the number of iterations needed for convergence without degrading accuracy.

II-E Summary of Proposed Power Estimation and Denoising Algorithms

\fref

tbl:comparison summarizes the complexity and accuracy of the different estimators. “Baseline EM” refers to \frefest:EM, “accelerated EM” to \frefest:EM initialized using \frefest:noisevariance, “nonparametric” to \frefest:noisevariance, and “parametric” to \frefest:sandwich. The complexity for blind noise power estimation is mentioned below the definition of each of these algorithms. The complexity for denoising is the complexity of estimating the noise power plus the complexity of the BEACHES algorithm from [8]. Since BEACHES already sorts the magnitudes of the noisy signal, the nonparametric and parametric estimators that use the median require no additional complexity for estimating the noise power. Anticipating the results shown in \frefsec:synthetic_results and \frefsec:channel_denoising, we illustrate (qualitatively) the accuracy of the estimators with synthetic data that perfectly matches the BCG prior, and with practical examples that deviate from this prior.

III Theory

We first show that the sample median approaches the median for $D\to\infty$ and introduce our statistical model for sparse vectors. We then derive and analyze Estimators 1 to 7. The observations made in this section are valid in the large-dimension limit and for the noisy BCG model to be introduced in \frefdef:noisyBCG. We use simulations to demonstrate the accuracy of our estimators for finite (and small) dimensions $D$ with the noisy BCG model in \frefsec:synthetic_results. To demonstrate the efficacy of our methods in practical scenarios with signals that deviate from the BCG model, we evaluate our denoising algorithms in three distinct scenarios in \frefsec:channel_denoising.

III-A Convergence of the Sample Median for $D\to\infty$

We will use the following definition of the median.

Definition 2 (Median).

Let $X$ be an absolutely continuous random variable (RV) with cumulative distribution function (CDF) $F_{X}(x)$ . Then, the median $\mathsf{m}_{X}$ of $X$ is defined as

[TABLE]

While, analogously to the central limit theorem, the sample median is approximately Gaussian if $D$ is large (see, e.g., [43]), we will only use the following result.

Lemma 1 (Lem. C.1 from [43]).

Let $X$ be a RV whose PDF is differentiable in some neighborhood of the median $\mathsf{m}_{X}$ and vector $\mathbf{x}$ contain i.i.d. samples of $X$ . Then, for any $c>0$ the sample median $\overline{\mathsf{m}}(\mathbf{x})$ satisfies

[TABLE]

This result implies that in the large-dimension limit ( ${D\to\infty}$ ), the sample median $\overline{\mathsf{m}}(\mathbf{x})$ converges in probability to the median $\mathsf{m}_{X}$ . Hence, by observing a sufficiently large number of samples, which is possible in modern multi-antenna mmWave or OFDM systems, we can accurately estimate the median $\mathsf{m}_{X}$ .

III-B Statistical Model for Complex-Valued Sparse Vectors

To derive and analyze the blind estimators proposed in \frefsec:nonparametricestimators, we need a statistical model for the sparse signal $\mathbf{s}$ . This model should (i) have as few parameters as possible while being able to model a large class of complex-valued sparse vectors typically arising in communication systems and (ii) facilitate a theoretical analysis. In what follows, we consider BCG random vectors [44, 20], which allow control over the signal sparsity and the signal power. We reiterate that the BCG model is instrumental only for our analysis. The provided simulation results in \frefsec:channel_denoising will show that the proposed estimators exhibit robustness to model mismatch, e.g., for signals that are not necessarily i.i.d. Gaussian or circularly symmetric.

Definition 3 (BCG Random Vector).

A sparse vector $\mathbf{s}\in\mathbb{C}^{D}$ is BCG if each entry is nonzero with probability $p\in(0,1]$ , and the nonzero entries are i.i.d. circularly-symmetric complex Gaussian with variance $E_{s}/p$ . The PDF of each entry $s_{d}$ , $d=1,\ldots,D$ , is therefore given by

[TABLE]

where $\delta(\cdot)$ is the Dirac delta distribution.

With this model, the activity rate is $p=\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{0}\right]/D$ (meaning the expected number of nonzero entries is ${\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{0}\right]=pD}$ ), and the average power of the sparse signal vector $\mathbf{s}$ is $E_{s}=\frac{1}{D}\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{2}^{2}\right]$ .

In \frefsys:systemmodel1, we assumed that the noise vector $\mathbf{n}$ is i.i.d. circularly-symmetric complex Gaussian with variance $N_{0}$ per complex entry. Hence, the PDF of each entry $n_{d}$ , $d=1,\ldots,D$ , is given by $f^{\mathcal{CN}}(n_{d};0,N_{0})\triangleq\frac{1}{\pi N_{0}}e^{-|n_{d}|^{2}/N_{0}}$ . Consequently, if $\mathbf{s}$ is a BCG random vector, then the PDF of the noisy observation vector $\mathbf{y}=\mathbf{s}+\mathbf{n}$ is as follows.

Definition 4 (Noisy BCG Random Vector).

The PDF of the entries $y_{d}$ , $d=1,\ldots,D$ , of a BCG random vector per \frefdef:BCG observed as in \frefsys:systemmodel1 is given by

[TABLE]

For this signal and observation model, we are now able to derive and analyze Estimators 1 to 7. We will make frequent use of the entry-wise square of vector $\mathbf{y}$ that we will call $\mathbf{z}\triangleq|\mathbf{y}|^{2}$ . We also define a random variable (RV) $Z$ with the same distribution as any of the i.i.d entries of $\mathbf{z}$ , and let $\mathsf{m}_{Z}$ be the median of $Z$ .

III-C Analysis of \frefest:noisevariance

We start with the blind noise power estimator defined in \frefest:noisevariance. We have the following key result. The proof is given in \frefapp:mainresult.

Theorem 1.

Let $\mathbf{y}$ be a noisy BCG random vector with PDF as in \frefdef:noisyBCG and with activity rate satisfying

[TABLE]

Let a lower bound LB and an upper bound UB be defined as follows:

[TABLE]

Then, the average noise power $N_{0}$ satisfies555Here we simplify the notation: $\lim_{D\to\infty}\widehat{N}_{0}$ converges in probability to $\mathsf{m}_{Z}/\log(2)$ , and strictly speaking this latter expression is the upper bound.

[TABLE]

\fref

thm:mainresult has the following key implications: (i) In the large-dimension limit, the proposed blind estimate $\widehat{N}_{0}$ bounds the average noise power $N_{0}$ from above, i.e., we have developed a pessimistic estimator. (ii) If $\textit{SNR}\to 0$ or $p\to 0$ , then ${\textit{LB}=\textit{UB}=\mathsf{m}_{Z}/\log(2)}$ in \frefeq:yummysandwichbound, and therefore $N_{0}=\mathsf{m}_{Z}/\log(2)$ . Thus, either for $p\to 0$ or $\textit{SNR}\to 0$ , the proposed estimate is exact, i.e., $\widehat{N}_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{D\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{prob.} $}}}\mathsf{m}_{Z}/\log(2)=N_{0}$ . We summarize this important insight in the following remark.

Remark 1.

In the large-dimension limit ( $D\to\infty$ ), the proposed blind nonparametric estimate $\widehat{N}_{0}$ is pessimistic (i.e., overestimates the average noise power $N_{0}$ ), and becomes exact at low SNR or low activity rate $p$ (i.e., for sparse vectors).

Next, we present bounds on the relative error of \frefest:noisevariance. These bounds depend on the activity rate $p$ and the SNR. The proof is given in \frefapp:errorproof.

Corollary 1.

For $p\leq p^{\text{max}}$ as in \frefeq:probabilitycondition, the relative error ${\varepsilon\triangleq|\widehat{N}_{0}-N_{0}|/N_{0}}$ of \frefest:noisevariance in the large-dimension limit is bounded as follows:

[TABLE]

An upper bound for the relative error $\varepsilon$ can be obtained if (i) an upper bound on the SNR is known, or (ii) an upper bound on $p$ is known, since $\log\left(\frac{1-p}{1-2p}\right)$ is nondecreasing for $p\in(0,0.5)$ . In addition, we confirm the second implication discussed below \frefthm:mainresult: \frefcor:errorbound implies that if $p\to 0$ (irrespective of the SNR) or $\textit{SNR}\to 0$ (irrespective of the sparsity), then the proposed estimator becomes exact, i.e., $\varepsilon=0$ and therefore $\widehat{N}_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{D\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{prob.} $}}}N_{0}$ .

III-D Analysis of \frefest:signalpower

For the blind estimate $\widehat{E}_{s}$ of the average signal power $E_{s}$ , we use the following lemma, which is derived from the fact that the entries of the vector $\mathbf{z}\triangleq|\mathbf{y}|^{2}$ are i.i.d. with expected value of $\operatorname{\mathbb{E}}\mathopen{}\left[z_{d}\right]=\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{y}\|_{2}^{2}\right]/D=E_{s}+N_{0}$ , $d=1,\ldots,D$ .

Lemma 2.

Let $\mathbf{y}$ be a noisy BCG random vector with PDF as in \frefdef:noisyBCG. Then, according to the strong law of large numbers we have

[TABLE]

To obtain \frefest:signalpower in \frefeq:signalpowerestimator, we construct a blind estimator of $E_{s}$ by taking the left side of \frefeq:limit_of_Es and replacing the average noise power $N_{0}$ with the blind estimate $\widehat{N}_{0}$ from \frefest:noisevariance. To avoid negative values of $E_{s}$ that have no physical meaning, we assign a value of zero to our estimate if $\|\mathbf{y}\|_{2}^{2}/D-\widehat{N}_{0}$ is negative. Since the estimate $\widehat{N}_{0}$ overestimates the true average noise power $N_{0}$ , the blind estimate $\widehat{E}_{s}$ in \frefeq:signalpowerestimator tends to underestimate the signal power. From \frefthm:mainresult it follows that for $p\to 0$ or $\textit{SNR}\to 0$ , the blind signal power estimate $\widehat{E}_{s}$ is exact.

III-E Analysis of \frefest:SNR

The blind SNR estimator is obtained by simply taking the ratio of $\widehat{E}_{s}$ in \frefeq:signalpowerestimator and $\widehat{N}_{0}$ in \frefeq:noiseestimator. For $D\to\infty$ , the blind signal power estimate underestimates the average signal power and the noise power estimate overestimates the average noise power, which means that the blind SNR estimate in \frefeq:SNRestimateomg underestimates the SNR. From \frefthm:mainresult it follows that for $D\to\infty$ with either $p\to 0$ or $\textit{SNR}\to 0$ the blind SNR estimate is exact.

III-F Analysis of \frefest:MSE

In order to analyze \frefest:MSE, we first assume that the average noise power $N_{0}$ is known. For this scenario, we can borrow the following two theorems from [8].

Theorem 2 (Thm. 1 of [8]).

Consider \frefsys:systemmodel2. Then, Stein’s unbiased risk estimate given by

[TABLE]

is an unbiased estimate of the MSE so that $\operatorname{\mathbb{E}}\mathopen{}\left[\textit{SURE}\right]=\textit{MSE}.$

Theorem 3 (Thm. 3 of [8]).

If $\eta$ is pseudo-Lipschitz, then SURE in \frefeq:complexSURE converges to the MSE in the large-dimension limit, i.e., we have $\lim_{D\to\infty}\textit{SURE}=\textit{MSE}.$

\fref

thm:SUREconvergence implies that if $N_{0}$ were known perfectly, then one could perfectly estimate the MSE in the large-dimension limit without knowledge of the sparse signal vector $\mathbf{s}$ . For smaller values of the dimension $D$ , \frefthm:MSEappox only ensures equality in expectation (while the estimator remains MSE-optimal). Equality in expectation means that some realizations will underestimate and some realizations will overestimate the true MSE.666 We have to keep in mind that we use the estimated MSE to determine parameters in the estimation function $\eta$ that minimize the MSE for each given realization of $\mathbf{y}$ . Therefore, offsets that depend on the realization of the noisy observation $\mathbf{y}$ can be treated as a constant and thus be ignored, even if these offsets cause the MSE to take on negative values. In other words, we are not interested in the true value of the MSE, but rather in the shape of the MSE function with respect to the parameters in $\eta$ .

\fref

est:MSE is a blind version of SURE, in which we have replaced the true average noise power $N_{0}$ by its estimate $\widehat{N}_{0}$ . Consequently, for $D\to\infty$ and either $p\to 0$ or $\textit{SNR}\to 0$ , we have that: (i) \frefrem:Nolargedimension states $\widehat{N}_{0}$ will be exact, from which it follows that $\widehat{\textit{MSE}}=\textit{SURE}$ , (ii) \frefthm:SUREconvergence ensures $\textit{SURE}=\textit{MSE}$ , and therefore (iii) \frefest:MSE will be exact ( $\widehat{\textit{MSE}}=\textit{MSE}$ ) in this scenario. For higher values of $p$ or SNR, we know that $\widehat{N}_{0}$ tends to overestimate $N_{0}$ , but since this estimated quantity appears twice in \frefeq:MSEnonparametricexplicitform with different signs, we cannot derive a simple rule that states whether \frefest:MSE tends to underestimate or overestimate the MSE.

III-G Analysis of \frefest:sandwich

\fref

est:sandwich is derived as the mean of the lower and upper bounds in \frefeq:yummysandwichbound, utilizing the SNR estimate from \frefest:SNR and an activity rate estimate $\hat{p}$ of the user’s choice. \frefest:sandwich often improves the performance (achieves lower bias) compared to \frefest:noisevariance, especially at high SNR. In contrast to \frefest:noisevariance, we no longer know if the noise power from \frefest:sandwich is being overestimated or underestimated. As this estimator takes $\hat{p}$ as a parameter, it is especially useful in applications where $p$ is known a priori or bounded (e.g., in OFDM systems the number of nonzero delay taps of the channel’s impulse response should not exceed the cyclic prefix length).

III-H Analysis of \frefest:p

To estimate the activity rate, we can use the equivalence of vector norms [45] that states $\|\mathbf{x}\|_{q}\leq L^{1/q-1/r}\|\mathbf{x}\|_{r}$ holds for any vector $\mathbf{x}\in\mathbb{C}^{L}$ if $1\leq q<r$ . In particular, it holds for a vector $\mathbf{s}^{\text{nz}}\in\mathbb{C}^{L}$ of length $L\triangleq\|\mathbf{s}\|_{0}$ that contains only the nonzero entries of the sparse vector $\mathbf{s}$ . For such vector, we have that $\|\mathbf{s}^{\text{nz}}\|_{q}\leq L^{1/q-1/r}\|\mathbf{s}^{\text{nz}}\|_{r}$ . Since the entries of $\mathbf{s}$ that are zero do not contribute to these norms, we note that $\|\mathbf{s}^{\text{nz}}\|_{q}=\|\mathbf{s}\|_{q}$ and $\|\mathbf{s}^{\text{nz}}\|_{r}=\|\mathbf{s}\|_{r}$ , and therefore

[TABLE]

Using \frefeq:sparsenorminequality, we can obtain a lower bound for the activity rate777The activity rate is $p\triangleq\operatorname{\mathbb{E}}\mathopen{}\left[\|\mathbf{s}\|_{0}\right]/D=\lim_{D\to\infty}{\|\mathbf{s}\|_{0}}/D$ . When $D$ is finite, we have ${\|\mathbf{s}\|_{0}}/D\approx p$ .:

[TABLE]

The inequality in \frefeq:norminequalitybound holds with equality if the nonzero entries of the signal are constant-modulus, i.e., if $|s^{\text{nz}}_{d}|=|s^{\text{nz}}_{d^{\prime}}|$ , $\forall\,d,d^{\prime}\in\{1,\ldots,L\}$ . We obtain the blind estimator $\hat{p}(q,r)$ from the left side of \frefeq:norminequalitybound, by replacing $\mathbf{s}$ with its noisy version $\mathbf{y}$ . With this substitution the inequality is not preserved (except if $N_{0}=0$ ), but we use that definition of $\hat{p}(q,r)$ as a rough activity rate estimate instead of picking an arbitrary value.

III-I Analysis of \frefest:EM

\fref

est:EM is a specialized variant of a classical EM algorithm for a two-component Gaussian mixture [42], adapted to complex-valued and zero-mean variables. We consider signal and noise power estimation from a noisy BCG signal as in \frefdef:noisyBCG. To understand it as a Gaussian source-separation problem, we consider that each entry of $\mathbf{y}$ is a realization of either (i) just noise with distribution $\mathcal{CN}(0,N_{0})$ , or (ii) signal plus noise with distribution $\mathcal{CN}(0,N_{0}+E_{s}/p)$ . Just-noise realizations occur with probability $1-p$ , while signal-plus-noise realizations occur with probability $p$ . Using EM, we estimate the variances of the circularly-symmetric complex Gaussians $N_{0}$ and $(N_{0}+E_{s}/p)$ , and mixture weights $1-p$ and $p$ . We use our previous knowledge to set the mean of the two distributions to zero, unlike classical EM algorithms that also estimate the means. We make the following observations: (i) This model allows any signal sparsity, as opposed to \frefest:noisevariance which assumes a maximum activity rate $p^{\text{max}}$ . (ii) In the low SNR regime, EM may not be able to separate the noise and signal components, as $N_{0}+E_{s}/p\approx N_{0}$ . (iii) The accuracy and the complexity of the algorithm will depend on the maximum number of iterations $K^{\text{max}}$ , the tolerance $\xi$ , the variance and weight initializations, and the noisy realization $\mathbf{y}$ .

To avoid EM converging to pathological solutions with arbitrary initialization, we initialize the algorithm with the following two minimum assumptions: (i) The signal is sparse, or equivalently $p^{\text{init}}\in(0,0.5)$ , and (ii) the power of the entries of $\mathbf{y}$ that contain only noise is smaller than the power of the entries of $\mathbf{y}$ that contain signal plus noise, or equivalently $N_{0}^{\text{init}}\leq\|\mathbf{y}\|_{2}^{2}/D$ . This translates to initializing the Gaussian mixture variances $v$ and weights $w$ with $v_{a}\leq\|\mathbf{y}\|_{2}^{2}/D$ , $w_{b}\in(0,0.5)$ , $w_{a}=1-w_{b}$ , and $v_{b}=v_{a}+(\|\mathbf{y}\|_{2}^{2}/D-v_{a})/w_{b}$ . We verify that for this initialization, the average power of the mixture is $w_{a}v_{a}+w_{b}v_{b}=\|\mathbf{y}\|_{2}^{2}/D$ , as expected.

IV Synthetic Results

We now characterize the accuracy of the estimators proposed in \frefsec:nonparametricestimators. We use the sparse signal model in \frefdef:noisyBCG. Without loss of generality, we fix the noise power to $N_{0}=1$ , while varying the signal power $E_{s}$ , the activity rate $p$ , and the dimension $D$ of the vectors. For different sets of parameters, we perform Monte–Carlo simulations with $10,000\text{\,}$ trials. In the plots, the thicker line with markers shows the average performance of an estimator, while the shaded area shows the region closer than one standard deviation away from the mean performance, a measure of the precision of the estimator.

IV-A Evaluation of the Noise Power, Signal Power, SNR, and Activity Rate Estimators

\fref

fig:estNo shows the effect of the SNR on the performance of the proposed blind nonparametric estimator $\widehat{N}_{0}$ from \frefeq:noiseestimator and the proposed blind parametric estimate $\widehat{N}_{0}(\hat{p})$ from \frefeq:sandwich_estimator, for which we only include results using $\hat{p}(q,r)$ with $q=1$ and $r=\infty$ for the activity rate estimate, as these parameters showed the best performance in our simulations, outperforming other values of $p$ and $q$ , and a fixed-value of $\hat{p}=0.25$ which is the center of the simulated range $p\in(0,0.5)$ . We also simulate the baseline EM estimate $\widehat{N}_{0}^{\text{EM}}$ described in \frefest:EM, initialized with $N_{0}^{\text{init}}=0.4\|\mathbf{y}\|_{2}^{2}/D$ and $p^{\text{init}}=\hat{p}(1,\infty)$ , a maximum of $K^{\text{max}}=30$ iterations and early stopping if the total parameter change is below $\xi=0.1$ %. As a baseline, we plot the genie-aided estimator $\overline{N}_{0}\triangleq\frac{1}{D}\|\mathbf{n}\|_{2}^{2}$ that has separate knowledge of $\mathbf{n}$ and the reference parameter $N_{0}$ .

\fref

fig:varyingSNR shows the effect of the SNR on the performance of the proposed signal power and SNR estimators for an activity rate of $p=0.1$ and a dimension of $D=64$ . In this case, $\widehat{\textit{SNR}}{\>\!}^{\text{EM}}\triangleq\widehat{E}_{s}^{\text{EM}}/\widehat{N}_{0}^{\text{EM}}$ , the genie-aided estimators that have separate knowledge of $\mathbf{s}$ and $\mathbf{n}$ are $\overline{E}_{s}\triangleq\frac{1}{D}\|\mathbf{s}\|_{2}^{2}$ and ${\overline{\textit{SNR}}\triangleq{\overline{E}_{s}}/{\overline{N}_{0}}}$ , and the reference parameters are $E_{s}$ and $\textit{SNR}\triangleq E_{s}/N_{0}$ .

From Figures 1 and 2, we observe the following facts about the blind nonparametric estimators: (i) For sparse vectors ( ${p=0.1}$ ), our estimators have a precision comparable to that of the genie-aided estimators even for a small sample size of $D=64$ . (ii) The precision of all considered estimators decreases as $D$ increases. (iii) As predicted by our theory, the average noise power is overestimated while the signal power and SNR are underestimated. (iv) At low SNR, the median-based estimators for these three quantities become exact. We also observe that the proposed blind parametric estimate $\widehat{N}_{0}(\hat{p})$ with $\hat{p}=\hat{p}(1,\infty)$ is more accurate than the blind nonparametric estimate $\widehat{N}_{0}$ at high SNR. However, $\widehat{N}_{0}(\hat{p})$ has fewer theoretical guarantees and is not an upper bound on $N_{0}$ .

\fref

fig:pest shows the accuracy of the blind, parametric activity rate \frefest:p. We see that at low and high SNR, $\hat{p}(1,2)$ tends to overestimate $p$ while $\hat{p}(2,\infty)$ tends to underestimate it. Overall, $\hat{p}(1,\infty)$ results in the best performance when combined with \frefest:sandwich. Admittedly, this is only a rough estimator and we include it as an example of what could be plugged into \frefest:sandwich or \frefest:EM. Nonetheless, we emphasize that side information about the signal’s sparsity should be utilized whenever available.

In comparison with EM (cf. Figures 1 and 2), our methods provide a less-accurate estimate at higher SNR, but require significantly lower complexity. The complexity of the baseline EM algorithm (in terms of the number of operations such as real-valued additions, real-valued multiplications, and exponentials) is more than $K(16D+12)+3D$ operations—with early stopping, the average number of iterations observed in our simulations ranges from $K=8$ to $K=28$ depending on the SNR. In contrast, our proposed median-based noise estimator has an average complexity of no more than $7.7D+9$ operations if the median is computed using quickselect [39], and avoids the evaluation of operations such as exponentials and divisions. Hence, our proposed blind estimator is more than $17\times$ less complex than the baseline EM algorithm, which renders our method suitable (i) for low-complexity parameter estimation and (ii) as a potential initializer for EM-based estimators.

IV-B Accelerated Convergence of EM Using Median-Based Initialization

\fref

fig:iterations shows the effect of initialization on the EM algorithm. To study the rate of convergence, we disable early stopping by setting $\xi=0$ so that the number of iterations is always $K=K^{\text{max}}$ , and plot the relative error $\varepsilon^{\text{EM}}\triangleq|\widehat{N}_{0}^{\text{EM}}-N_{0}|/N_{0}$ as we vary $K^{\text{max}}$ . We compare the convergence of (i) the accelerated EM algorithm (diamond markers) which is initialized with the blind nonparametric estimate $\widehat{N}_{0}$ , and (ii) the baseline EM algorithm (circular markers) initialized with a fixed initialization of $\textit{SNR}=5$ , which corresponds to setting the noise power to $1/6$ of the received power $\|\mathbf{y}\|_{2}^{2}/D$ . We simulated various values of $p$ and SNR, and picked four examples that are representative. At low SNR or high sparsity (low $p$ ), the accelerated EM algorithm converges already in the first iteration. In contrast, the baseline EM algorithm converges in more than 16 iterations in some cases. The only case we observe the baseline to outperform the accelerated EM algorithm is in \freffig:p0p4SNR5, in which (i) the baseline has advantage since $\frac{1}{6}\|\mathbf{y}\|_{2}^{2}/D$ coincides exactly with the true value of $N_{0}$ , and (ii) the SNR is high and the sparsity is low, making it the worst case for the $\widehat{N}_{0}$ estimate used by the accelerated EM. We also examine the effect of initializing the activity rate with (i) a fixed value of 0.25, versus (ii) the blind parametric estimate $\hat{p}(1,\infty)$ , and we observe no significant difference, especially for the preferred accelerated EM algorithm; however, as $\hat{p}(1,\infty)$ showed superior performance than a fixed value when used in the parametric noise power estimator $\widehat{N}_{0}(\hat{p})$ , we prefer $\hat{p}(q,r)$ when no side information about the signal’s sparsity is available.

IV-C Evaluation of the MSE Estimator

To evaluate the performance of the MSE estimator, we consider \frefsys:systemmodel2 with $\eta$ being the soft-thresholding function defined as

[TABLE]

where the denoising threshold is a real number $\tau\geq 0$ .

\fref

fig:MSE shows two realizations of the estimated MSE as a function of the tuning parameter $\tau$ . The only reference in this case is the genie-aided estimator $\overline{\textit{MSE}}\triangleq\frac{1}{D}\|\eta(\mathbf{y};\tau)-\mathbf{s}\|_{2}^{2}$ . We picked two examples that are representative of what we have observed through multiple experiments with different system parameters to illustrate the following observations: (i) If the MSE function has a pronounced minimum as in \freffig:MSE_SNR10, then the value of $\tau$ that minimizes the blind estimate tends to be very close to the value that minimizes the genie-aided MSE function. (ii) If the MSE function has a less pronounced minimum as in \freffig:MSE_SNR0p5, then the value of $\tau$ that minimizes the blind estimate may be far from the value that minimizes the genie-aided MSE function. In spite of that, because the MSE function is flat near the minimum, the genie-aided MSE function evaluated at these two values of $\tau$ returns values that are similar. In other words, (i) and (ii) summarize our observations that our algorithm finds a near-optimal (sub-optimal) denoising threshold $\tau$ when the MSE of the denoised channel is (not) sensitive to $\tau$ . Note that here we have only picked two representative realizations; in \frefsec:channel_denoising, we validate our estimator with quantitative results by showing the denoising performance averaged over many realizations.

V Applications to Nonparametric Channel-Vector Denoising

We show three applications in wireless systems, in which the quality of channel estimates is essential for data detection. Concretely, we show that our algorithms can be applied to adaptively denoise pilot-based channel estimates, resulting in a reduced (improved) bit-error-rate (BER).

V-A Infinite-Resolution Massive Multiuser MIMO System

We start with an application of \frefest:MSE for beamspace channel estimation. As in [7], we simulate an uplink massive multiuser (MU) MIMO system in which $U=8$ single-antenna user equipments (UEs) transmit channel-estimation pilots and data to a basestation (BS) equipped with a uniform linear array of $D=128$ antenna elements. The UEs are randomly placed with a uniform distribution in a [math] circular sector around the BS, with a minimum distance of $10$ m and maximum distance of $110$ m from the BS. A minimum angular separation of [math] between UEs is enforced. We assume UE-side perfect power control (UEs adjust their transmit power so that the received power at the BS is equal for all UEs), and we ignore quantization at transmitter and receiver sides, assuming infinite-resolution signals.

We simulate a noiseless channel matrix $\mathbf{H}\in\mathbb{C}^{D\times U}$ using line-of-sight (LoS) realizations from the mmMAGIC QuaDRiGa model [46] with a carrier frequency of $f_{c}=60$ GHz. Each complex-valued entry $\mathbf{H}_{d,u}$ of the channel matrix contains the attenuation and phase between the $u$ th UE and the $d$ th BS antenna. For the channel estimation step, the UEs transmit orthogonal pilots. The maximum likelihood (ML) estimate of the channel matrix is obtained by right-multiplying the (noisy) received pilot sequence with the inverse of the orthogonal pilot matrix, resulting in

[TABLE]

where $\mathbf{H}\in\mathbb{C}^{D\times U}$ is the antenna-domain channel matrix, $\mathbf{N}^{\text{CE}}\in\mathbb{C}^{D\times U}$ is complex Gaussian channel estimation noise with power $N_{0}^{\text{CE}}$ per complex entry, and $\mathbf{H}^{\text{ML}}\in\mathbb{C}^{D\times U}$ is the ML channel estimate, which is a noisy observation of $\mathbf{H}$ . The beamspace representation of the ML estimate is obtained by taking a spatial Fourier transform across the antenna array resulting in

[TABLE]

Here, beamspace-domain quantities are designated by a tilde. Then, $\widetilde{\mathbf{H}}=\mathbf{F}\mathbf{H}$ is the beamspace channel matrix, ${\widetilde{\mathbf{N}}^{\text{CE}}=\mathbf{F}\mathbf{N}^{\text{CE}}}$ has the same distribution as $\mathbf{N}^{\text{CE}}$ as the discrete Fourier transform matrix $\mathbf{F}$ is unitary, and $\widetilde{\mathbf{H}}^{\text{ML}}$ is the beamspace ML channel estimate, which is a noisy observation of $\widetilde{\mathbf{H}}$ . Column indices of $\widetilde{\mathbf{H}}$ correspond to UEs, while row indices correspond to different angles-of-arrival to the BS. Since electromagnetic waves at high carrier frequencies experience strong attenuation, typical mmWave channels consist only of a small number of dominant propagation paths arriving at the BS. Thus, each column of $\widetilde{\mathbf{H}}$ (which is the beamspace channel vector of one UE) will be approximately sparse, with many entries being close to zero.

By writing each column of \frefeq:HMLbeamspace as an independent equation, we can express the channel estimation problem in the form of \frefsys:systemmodel1, that is, each beamspace channel vector (that contains only few nonzero entries) corresponds to a sparse signal $\mathbf{s}$ . The sparsity property implies that we can perform denoising to improve the ML channel estimate. After channel estimation, all UEs transmit data simultaneously using uncoded 16-QAM symbols and the BS performs data detection using the estimated channel vectors and linear minimum MSE equalization.

\fref

fig:inf_res shows simulation results for $10,000\text{\,}$ Monte–Carlo trials. For different channel estimation methods, we compute the MSE of the channel estimates and the resulting BER. We simulate beamspace channel estimation (BEACHES) as in [7], which denoises the columns of $\widetilde{\mathbf{H}}^{\text{ML}}$ in \frefeq:HMLbeamspace by applying the soft-thresholding function in \frefeq:soft-thresholding; the thresholding parameter $\tau$ is adaptively selected for each noisy observation by minimizing SURE using an $\mathcal{O}(D\log(D))$ algorithm that assumes perfect knowledge of the average noise power $N_{0}^{\text{CE}}$ . We compare this to NP BEACHES, a new nonparametric BEACHES variant which also applies soft-thresholding to the columns of $\widetilde{\mathbf{H}}^{\text{ML}}$ , but uses the (nonparametric) threshold $\tau$ that minimizes $\widehat{\textit{MSE}}$ as in \frefest:MSE; since $\widehat{\textit{MSE}}$ is a nonparametric version of SURE, NP BEACHES does not require knowledge of $N_{0}^{\text{CE}}$ . In addition, we include a variant that we call EM BEACHES, which uses a version of $\widehat{\textit{MSE}}$ in which $\widehat{N}_{0}$ in \frefeq:MSEnonparametricexplicitform is replaced by $\widehat{N}_{0}^{\text{EM}}$ from \frefest:EM; for $\widehat{N}_{0}^{\text{EM}}$ , we use $N_{0}^{\text{init}}=0.4\|\mathbf{y}\|_{2}^{2}/D$ and $p^{\text{init}}=\hat{p}(1,\infty)$ , a maximum of $K^{\text{max}}=30$ iterations and early stopping if the total parameter change is below $\xi=0.1$ %. The three versions of BEACHES as described above, after denoising the beamspace channel vectors, use the inverse Fourier transform to obtain an antenna-domain channel estimate to be used for data detection. As a reference, we show the performance of perfect channel state information (CSI) that uses the ground truth (noiseless) channel matrix $\mathbf{H}$ , and ML estimation that simply takes the noisy observation $\mathbf{H}^{\text{ML}}$ in \frefeq:HML as the estimate.

From \freffig:inf_res, we observe that NP BEACHES achieves virtually the same performance as the original BEACHES algorithm (which requires knowledge of $N_{0}^{\text{CE}}$ ), except at high SNR where \frefest:noisevariance tends to overestimate $N_{0}^{\text{CE}}$ . We reiterate that NP BEACHES requires no parameters and exhibits the same low complexity of $\mathcal{O}(D\log(D))$ as the original BEACHES algorithm, because the latter already sorts the entries of $|\mathbf{y}|^{2}$ , which we can reuse to compute the median in \frefest:noisevariance. We observe that EM BEACHES achieves higher (worse) MSE at low SNR and does not outperform NP BEACHES at higher SNR.

In summary, denoising methods can significantly improve the ML channel estimate. All three BEACHES variants achieve similar BER performance. However, BEACHES needs knowledge of the noise power and EM BEACHES exhibits higher complexity than our nonparametric estimate, which renders NP BEACHES the preferable denoising method in this application scenario.

V-B Low-Resolution Massive Multiuser MIMO System

Next, we consider the same uplink massive MU-MIMO system as \frefsec:infinite-res, but in this case each radio-frequency (RF) chain at the BS is equipped with a pair of 1-bit analog-to-digital converters (ADCs) to quantize the in-phase and quadrature baseband signals. Each RF chain applies a quantization function $Q(x)\triangleq\operatorname{sign}\left(\Re\{x\}\right)+j\operatorname{sign}\left(\Im\{x\}\right)$ to the baseband signal, where $j^{2}=-1$ . For simplicity, we assume that the pilot matrix is an identity, i.e., each UE has a dedicated time slot to transmit one pilot while all other UEs are silent. The receive pilots then correspond to the 1-bit version of the ML channel estimate, which we call 1-bit ML888 $\mathbf{H}^{\text{1-bit ML}}$ is simply the 1-bit version of $\mathbf{H}^{\text{ML}}$ , not to be confused with the maximum likelihood channel estimate given a one-bit observation.

[TABLE]

Here, quantization happens in the antenna domain and yet, when the quantized noisy channel is converted to beamspace, the sparse structure that is present in infinite-resolution beamspace channel vectors is also present in the coarsely quantized beamspace channel vectors. Thus,

[TABLE]

has sparse columns that can be denoised. For more details on the validity of this statement, see [9] where $\widetilde{\mathbf{H}}^{\text{1-bit ML}}$ was decomposed in a linear combination of $\widetilde{\mathbf{H}}$ plus a residual.

\fref

fig:one_bit_res shows simulation results for $10,000\text{\,}$ Monte–Carlo trials. For different channel estimation methods, we compute the MSE and BER. All UEs simultaneously transmit uncoded QPSK symbol, and the BS uses the estimated channels in order to perform 1-bit Bussgang linear minimum MSE equalization as described in [47].

We simulate $1$ -BEACHES as in [9]. This denoising algorithm decomposes \frefeq:HML1bitbeamspace as $\widetilde{\mathbf{H}}^{\text{1-bit ML}}=\widetilde{\mathbf{H}}+\widetilde{\mathbf{Q}}$ , where $\widetilde{\mathbf{Q}}$ represents the equivalent noise-plus-quantization error with average power $Q_{0}=2+E_{s}-4E_{s}/\sqrt{\pi(E_{s}+N_{0})}$ per entry [9]. The $1$ -BEACHES algorithm denoises the columns of $\widetilde{\mathbf{H}}^{\text{1-bit ML}}$ with the threshold $\tau$ that minimizes SURE, assuming perfect knowledge of $Q_{0}$ . We also use the nonparametric algorithms NP BEACHES and EM BEACHES (described in \frefsec:infinite-res) to denoise the columns of $\widetilde{\mathbf{H}}^{\text{1-bit ML}}$ . After denoising the beamspace channel vectors, these three BEACHES variants use the inverse Fourier transform to obtain an antenna-domain channel estimate. We compare these estimators with $\mathbf{H}^{\text{1-bit ML}}$ from \frefeq:HML1b, and with the perfect CSI estimate that uses the ground truth $\mathbf{H}$ as the channel estimate.

Since NP BEACHES uses the median-based noise estimate (which in this case estimates the effective “noise” floor that includes quantization errors), it is robust to outliers and is able to achieve MSE and BER performance very close to $1$ -BEACHES that has perfect knowledge of the noise-plus-quantization power. The EM estimator, however, strongly relies on the distribution of the noise and signal being Gaussian. Here, the signal is a realistic channel vector which is not Gaussian; more importantly, $\widetilde{\mathbf{Q}}$ contains the effect of noise but also quantization error, which means the equivalent noise also deviates from a Gaussian distribution. We attribute the higher (worse) BER of EM BEACHES to these two factors. We note that $1$ -BEACHES is designed specifically for 1-bit quantization and that the expression for $Q_{0}$ (which requires knowledge of the noise power and the signal power) would be different if the ADCs use a different number of bits. In contrast, our nonparametric denoiser is agnostic to the quantizer’s resolution and automatically determines the power of the noise plus quantization, as long as the signal is approximately sparse and the noise is approximately Gaussian.

V-C Cell-Free Communication System

We simulate an uplink cell-free communication system with $U=16$ single-antenna UEs and $D=256$ single-antenna BSs. The UEs and BSs are randomly placed with a uniform distribution in a square with $1\text{\,}\mathrm{k}\mathrm{m}^{2}$ area. The UEs transmit orthogonal pilots followed by QPSK data. All of the UEs transmit simultaneously and the received signal at all the BSs is processed at a central processing unit (CPU) that performs channel estimation and linear minimum MSE detection.

We simulate a cell-free channel matrix $\mathbf{H}$ using the model proposed by [48], with parameters as in [2] but without power control and with a transmit power of $12.5$ mW per UE. As in \frefeq:HML, the ML estimate of the channel matrix is obtained by right-multiplying the pilot sequence received at the CPU with the inverse of the orthogonal pilot matrix (we used a Hadamard pilot matrix), resulting in

[TABLE]

The columns of $\mathbf{H}$ (or channel vectors) contain the attenuations and phases between one UE and all BSs. For each UE, the BSs that have LoS or are closer to this UE will receive significantly higher power than the other BSs that are not nearby. This means that in the cell-free system, the channel vectors are approximately sparse [11] and the ML estimate can be denoised. Although the thermal noise variance at different basestations may differ, we assume i.i.d. noise in this paper.

\fref

fig:cell_free shows the results of $10,000\text{\,}$ Monte–Carlo trials. On the left, we plot the CDF of the MSE of the channel estimates, and on the right, the CDF of the root-mean-squared-symbol-error (RMSSE). The RMSSE is a measure of how far the expected QPSK symbol is from the received data symbol after equalization with the channel estimates, and can be seen as equivalent to the error-vector-magnitude (EVM) for one UE.

In \freffig:cell_free, we observe a clear MSE improvement of the three denoising algorithms over the ML estimate: For a given value $x$ , there are more realizations of channel estimates whose MSE is smaller than $x$ for denoised channels than for ML. The fact that denoising improves the channel estimates is reflected in the RMSSE, since equalization is more effective and the obtained symbols are closer to the expected constellation points. We consider the RMSSE requirement of $17.5\text{\,}\mathrm{\char 37\relax}$ for QPSK from [49, Table 6.5.2.2-1]. The probability that a UE meets the requirement grows from $0.43\text{\,}$ with ML channel estimation, to $0.59\text{\,}$ with NP BEACHES or EM BEACHES denoising, an increase of $0.16\text{\,}$ . BEACHES with perfect knowledge of the noise power has a slight additional advantage, with a probability of meeting the requirement of $0.66\text{\,}$ .

VI Conclusions

We have proposed blind estimators for the average noise power, signal power, SNR, and MSE. Our estimators can be calculated at low complexity and only require the noisy observation vector, avoiding the need for additional pilot signals entirely. We have analyzed our estimators for a Bernoulli complex Gaussian sparsity model and evaluated their accuracy via simulations. Using three channel-vector denoising tasks in (i) a multi-antenna mmWave system, (ii) a 1-bit quantized multi-antenna mmWave system, and (iii) a cell-free system, we have demonstrated that our blind estimators can be used to develop a novel nonparametric denoiser that achieves comparable performance and the same complexity as BEACHES in [7, 8] which requires knowledge of the average noise power. We believe that the proposed blind estimators find potential use in a large number of other wireless communication applications that contain sparse complex-valued signals.

There are many avenues for future work. For signals that are less sparse (i.e., $p>0.421$ ), one may want to replace the median by a higher quantile and the scaling factor $\log(2)$ needs to be adjusted accordingly—a derivation of such estimators would follow immediately from our results in \frefsec:noise_estimator_analysis. Huber M-estimators [50] combine the idea of mean and median, and they may also prove useful for blind noise power estimation in the presence of sparse signals. In the case of non-Gaussian, non-circularly-symmetric, or non-i.i.d sparse signals, new estimators can be tailored to exploit specific statistical properties (e.g., structured sparsity). Extending the statistical model, e.g., to signals with correlation or structured sparsity, can lead to improved estimators and is left for future work. In the case of colored noise (e.g., stemming from interference or large variations in radio-frequency circuitry), noise whitening techniques could be considered.

Appendix A Proof of \frefthm:mainresult

A-A Prerequisites

In what follows, we will need the distribution of $\mathbf{z}\triangleq|\mathbf{y}|^{2}$ , where we assume $\mathbf{y}$ is distributed according to \frefdef:noisyBCG. Given a circularly-symmetric complex Gaussian RV $A$ with variance $E_{a}$ , the RV $B=|A|^{2}$ is exponentially distributed with CDF $F_{B}(b)\triangleq 1-e^{-\frac{b}{{E_{a}}}}$ , $b\geq 0$ . Then, the CDF of each entry of the absolute-square noisy observation is as follows.

Definition 5 (Noisy BCG Power RV).

Let $\mathbf{y}$ be as in \frefdef:noisyBCG and let $\mathbf{z}\triangleq|\mathbf{y}|^{2}$ . Then, for $z_{d}\geq 0$ , the CDF of each entry of $\mathbf{z}$ is given by

[TABLE]

A-B Upper Bounds on the Median

We start with the following two upper bounds on the median $\mathsf{m}_{Z}$ of a noisy BCG power RV $Z$ with CDF given in \frefeq:noisyBCGpower.

Lemma 3.

For a noisy BCG power RV in \frefdef:noisyBCGsquared with $p<0.5$ , the median is bounded from above by

[TABLE]

Proof.

We start from the definition of the median in \frefeq:median for the RV $Z$ with CDF as in \frefeq:noisyBCGpower:

[TABLE]

Since the second term is nonnegative, we can omit it to obtain the following inequality:

[TABLE]

Note that this bound will be useful for vectors $\mathbf{s}$ that are sparse, i.e., where $p$ is small. We can simplify \frefeq:expression0 as

[TABLE]

which leads to the upper bound on the median $\mathsf{m}_{Z}$ . In order to take the logarithm in \frefeq:logarithmcondition, we require $p\in(0,0.5)$ .∎

Lemma 4.

For a noisy BCG power RV $Z$ in \frefdef:noisyBCGsquared with $p\leq\frac{1/2-e^{-2}}{1-e^{-2}}\approx 0.421$ , the median is bounded from above by

[TABLE]

Proof.

We start from the definition of the median as in \frefeq:niceform. Let us define the function $g(r)\triangleq e^{-1/r}$ with $r>0$ . We can now rewrite \frefeq:niceform as follows:

[TABLE]

The function $g(r)$ is concave for $r\geq 1/2$ . Therefore, to ensure concavity of $g(r)$ in \frefeq:nicejensenexpression, we need

[TABLE]

The two conditions in \frefeq:twoconcavityconditions are guaranteed as long as ${2N_{0}\geq\mathsf{m}_{Z}}$ . Because CDFs are nondecreasing functions, requiring ${2N_{0}\geq\mathsf{m}_{Z}}$ is equivalent to requiring ${F_{Z}(2N_{0})\geq F_{Z}(\mathsf{m}_{Z})=1/2}$ , which we can simplify as

[TABLE]

Finally, to ensure \frefeq:pconditionSNR holds for all values of $E_{s}$ and $N_{0}$ , we require

[TABLE]

which implies that the condition $p\leq p^{\text{max}}$ in \frefeq:probabilitycondition ensures concavity of $g(r)$ . Then, assuming $p\leq p^{\text{max}}$ , we can now use Jensen’s inequality on the expression in \frefeq:nicejensenexpression to get

[TABLE]

We can now simplify this expression to

[TABLE]

which is the inequality in \freflem:smollemma2. ∎

A-C Lower Bound on the Median

We now establish the following lower bound on the median.

Lemma 5.

For a noisy BCG power RV $Z$ in \frefdef:noisyBCGsquared with $p\in(0,1]$ , the median is bounded from below by

[TABLE]

Proof.

We start from the definition of the median as in \frefeq:niceform. Since the exponential CDF $F_{B}(b)\triangleq 1-e^{-\frac{b}{{E_{a}}}}$ for ${E_{a}}\geq 0$ is concave in $b$ , Jensen’s inequality leads to

[TABLE]

We can simplify this expression to obtain the following bound

[TABLE]

which leads to the inequality in \freflem:smollemma3 we wanted to prove. ∎

A-D Combining the Results

For all values of $p\in(0,1]$ and $\textit{SNR}\geq 0$ , we have that

[TABLE]

and we defined $\widehat{N}_{0}$ such that $\widehat{N}_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{D\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{prob.} $}}}{\mathsf{m}_{Z}}/{\log(2)}$ according to \freflem:convergenceofmedian.

Finally, we can combine \frefeq:uperuperbound with \freflem:smollemma1, \freflem:smollemma2 and \freflem:smollemma3 to obtain \frefeq:yummysandwichbound.

Appendix B Proof of \frefcor:errorbound

Proof.

Let the relative error of \frefest:noisevariance be ${\varepsilon\triangleq{|\widehat{N}_{0}-N_{0}|}/{N_{0}}}$ . Using the inequalities from \frefthm:mainresult and the quantities LB and UB defined there, we can bound $\varepsilon$ as follows:

[TABLE]

By using $\widehat{N}_{0}\xrightarrow[{\raisebox{3.0pt}[0.0pt][0.0pt]{$ \scriptstyle{D\to\infty} $}}]{{\raisebox{-0.5pt}[0.0pt][0.0pt]{$ \scriptstyle{prob.} $}}}{\mathsf{m}_{Z}}/{\log(2)}$ and replacing LB from \frefeq:LB and UB from \frefeq:UB into \frefeq:relativeerrorbound, after some simplifications, we obtain \frefeq:errorbound. ∎

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Gallyas-Sanhueza and C. Studer, “Blind SNR estimation and nonparametric channel denoising in multi-antenna mm Wave systems,” in IEEE Int. Conf. Commun. (ICC) , Jun. 2021, pp. 1–7.
2[2] H. Song, X. You, C. Zhang, O. Tirkkonen, and C. Studer, “Minimizing pilot overhead in cell-free massive MIMO systems via joint estimation and detection,” in Proc. IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC) , May 2020, pp. 1–5.
3[3] T. Schenk, RF imperfections in high-rate wireless systems: impact and digital compensation . Springer Science & Business Media, 2008.
4[4] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access , vol. 1, pp. 335–349, May 2013.
5[5] F. Rusek, D. Persson, B. Kiong, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challenges with very large large arrays,” IEEE Signal Process. Mag. , vol. 30, no. 1, pp. 40–60, Jan. 2013.
6[6] 3GPP, “5G; NR; user equipment (UE) radio transmission and reception; part 1: Range 1 standalone,” Nov. 2020, TS 38.101-1 version 16.5.0 Rel. 16.
7[7] R. Ghods, A. Gallyas-Sanhueza, S. H. Mirfarshbafan, and C. Studer, “BEACHES: Beamspace channel estimation for multi-antenna mm Wave systems and beyond,” in Proc. IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC) , Jul. 2019, pp. 1–5.
8[8] S. H. Mirfarshbafan, A. Gallyas-Sanhueza, R. Ghods, and C. Studer, “Beamspace channel estimation for massive MIMO mm Wave systems: Algorithm and VLSI design,” IEEE Trans. Circuits Sys. I (TCAS-I) , vol. 67, no. 12, pp. 5482–5495, Sep. 2020.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Low-Complexity Blind Parameter Estimation

Abstract

I Introduction

I-A Prior Art in Blind and Nonparametric Estimation

I-B Contributions

I-C Notation

II Practical Guide to Low-Complexity Blind Estimators

II-A System Models

System Model 1**.**

System Model 2**.**

II-B Low-Complexity Blind Nonparametric Estimators

Definition 1** (Sample Median).**

Estimator 1** (Average Noise Power).**

Estimator 2** (Average Signal Power).**

Estimator 3** (Signal-to-Noise Ratio).**

Estimator 4** (Mean-Square Error).**

II-C Low-Complexity Blind Parametric Estimators

Estimator 5** (Average Noise Power with Estimated SNR and Activity Rate Corrections).**

Estimator 6** (Activity Rate).**

II-D Blind Parametric Estimator Based on Expectation-Maximization (EM)

Estimator 7** (Noise Power, Signal Power, and Activity Rate).**

II-E Summary of Proposed Power Estimation and Denoising Algorithms

III Theory

III-A Convergence of the Sample Median for D→∞D\to\inftyD→∞

Definition 2** (Median).**

Lemma 1** (Lem. C.1 from [43]).**

III-B Statistical Model for Complex-Valued Sparse Vectors

Definition 3** (BCG Random Vector).**

Definition 4** (Noisy BCG Random Vector).**

III-C Analysis of \frefest:noisevariance

Theorem 1**.**

Remark 1**.**

Corollary 1**.**

III-D Analysis of \frefest:signalpower

Lemma 2**.**

III-E Analysis of \frefest:SNR

III-F Analysis of \frefest:MSE

Theorem 2** (Thm. 1 of [8]).**

Theorem 3** (Thm. 3 of [8]).**

III-G Analysis of \frefest:sandwich

III-H Analysis of \frefest:p

III-I Analysis of \frefest:EM

IV Synthetic Results

IV-A Evaluation of the Noise Power, Signal Power, SNR, and Activity Rate Estimators

IV-B Accelerated Convergence of EM Using Median-Based Initialization

IV-C Evaluation of the MSE Estimator

V Applications to Nonparametric Channel-Vector Denoising

V-A Infinite-Resolution Massive Multiuser MIMO System

V-B Low-Resolution Massive Multiuser MIMO System

V-C Cell-Free Communication System

VI Conclusions

Appendix A Proof of \frefthm:mainresult

A-A Prerequisites

Definition 5** (Noisy BCG Power RV).**

A-B Upper Bounds on the Median

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

A-C Lower Bound on the Median

Lemma 5**.**

Proof.

A-D Combining the Results

Appendix B Proof of \frefcor:errorbound

Proof.

System Model 1.

System Model 2.

Definition 1 (Sample Median).

Estimator 1 (Average Noise Power).

Estimator 2 (Average Signal Power).

Estimator 3 (Signal-to-Noise Ratio).

Estimator 4 (Mean-Square Error).

Estimator 5 (Average Noise Power with Estimated SNR and Activity Rate Corrections).

Estimator 6 (Activity Rate).

Estimator 7 (Noise Power, Signal Power, and Activity Rate).

III-A Convergence of the Sample Median for $D\to\infty$

Definition 2 (Median).

Lemma 1 (Lem. C.1 from [43]).

Definition 3 (BCG Random Vector).

Definition 4 (Noisy BCG Random Vector).

Theorem 1.

Remark 1.

Corollary 1.

Lemma 2.

Theorem 2 (Thm. 1 of [8]).

Theorem 3 (Thm. 3 of [8]).

Definition 5 (Noisy BCG Power RV).

Lemma 3.

Lemma 4.

Lemma 5.