Generalized Approximate Message Passing for Massive MIMO mmWave Channel   Estimation with Laplacian Prior

Faouzi Bellili; Foad Sohrabi; Wei Yu

arXiv:1903.02077·cs.IT·March 7, 2019

Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation with Laplacian Prior

Faouzi Bellili, Foad Sohrabi, Wei Yu

PDF

Open Access

TL;DR

This paper introduces a novel Bayesian channel estimation method for massive MIMO mmWave systems using GAMP with a Laplacian prior, improving accuracy, speed, and complexity over existing approaches.

Contribution

It develops a Laplacian prior-based GAMP algorithm with an EM procedure for automated parameter learning, enhancing mmWave MIMO channel estimation.

Findings

01

Improved channel estimation accuracy and achievable rate.

02

Faster convergence of GAMP with Laplacian prior.

03

Reduced computational complexity compared to Gaussian prior methods.

Abstract

This paper tackles the problem of millimeter-Wave (mmWave) channel estimation in massive MIMO communication systems. A new Bayes-optimal channel estimator is derived using recent advances in the approximate belief propagation (BP) Bayesian inference paradigm. By leveraging the inherent sparsity of the mmWave MIMO channel in the angular domain, we recast the underlying channel estimation problem into that of reconstructing a compressible signal from a set of noisy linear measurements. Then, the generalized approximate message passing (GAMP) algorithm is used to find the entries of the unknown mmWave MIMO channel matrix. Unlike all the existing works on the same topic, we model the angular-domain channel coefficients by Laplacian distributed random variables. Further, we establish the closed-form expressions for the various statistical quantities that need to be updated iteratively by…

Equations200

y (k)

y (k)

H

H

a_{t} (Ω)

a_{t} (Ω)

a_{r} (Ω)

H

H

H

H

v_{p, q}^{r} [m]

v_{p, q}^{r} [m]

\displaystyle\leavevmode\nobreak\ \Big{|}\mathbf{v}_{p,q}^{r}[m]\Big{|}\leavevmode\nobreak\ =\leavevmode\nobreak\ \frac{1}{\sqrt{M_{r}}}\left|\frac{\sin\Big{(}\pi\big{[}m-1-\frac{M_{r}}{2}\Omega^{r}_{p,q}\big{]}\Big{)}}{\sin\Big{(}\frac{\pi}{M_{r}}\big{[}m-1-\frac{M_{r}}{2}\Omega^{r}_{p,q}\big{]}\Big{)}}\right|.

\displaystyle\leavevmode\nobreak\ \Big{|}\mathbf{v}_{p,q}^{r}[m]\Big{|}\leavevmode\nobreak\ =\leavevmode\nobreak\ \frac{1}{\sqrt{M_{r}}}\left|\frac{\sin\Big{(}\pi\big{[}m-1-\frac{M_{r}}{2}\Omega^{r}_{p,q}\big{]}\Big{)}}{\sin\Big{(}\frac{\pi}{M_{r}}\big{[}m-1-\frac{M_{r}}{2}\Omega^{r}_{p,q}\big{]}\Big{)}}\right|.

\displaystyle\Big{|}\Omega^{r}_{p,q}-\frac{2(m_{0}-1)}{M_{r}}\Big{|}<\frac{2}{M_{r}}.

\displaystyle\Big{|}\Omega^{r}_{p,q}-\frac{2(m_{0}-1)}{M_{r}}\Big{|}<\frac{2}{M_{r}}.

\displaystyle\Big{|}\Omega^{t}_{p,q}-\frac{2(n_{0}-1)}{M_{t}}\Big{|}<\frac{2}{M_{t}}.

\displaystyle\Big{|}\Omega^{t}_{p,q}-\frac{2(n_{0}-1)}{M_{t}}\Big{|}<\frac{2}{M_{t}}.

y (k)

y (k)

=

Y

Y

\displaystyle\textrm{vec}\big{(}\widetilde{\mathbf{Y}}\big{)}

\displaystyle\textrm{vec}\big{(}\widetilde{\mathbf{Y}}\big{)}

y

y

≜ y [ℜ {y} ℑ {y}]

≜ y [ℜ {y} ℑ {y}]

y

y

H \leavevmode = \leavevmode U_{r} H U_{t}^{H},

H \leavevmode = \leavevmode U_{r} H U_{t}^{H},

p_{X} (x; β) = N (x; 0, β),

p_{X} (x; β) = N (x; 0, β),

p_{X} (x; α)

p_{X} (x; α)

p_{X} (x; b)

p_{X} (x; b)

X

X

p_{X} (x)

p_{X} (x)

=

x_{n}^{\textsc M A P}

x_{n}^{\textsc M A P}

x_{n}^{\textsc M M S E}

=

p_{Y ∣ Z} (y ∣ z; σ_{w}^{2})

p_{Y ∣ Z} (y ∣ z; σ_{w}^{2})

\displaystyle\!\!\!\!\!\!\!\!p_{\mathcal{Z}_{m}|\bm{\mathcal{Y}}}\big{(}z_{m}|\mathbf{y};\widehat{p}_{m},\mu^{p}_{m},\bm{\theta}\big{)}

\displaystyle\!\!\!\!\!\!\!\!p_{\mathcal{Z}_{m}|\bm{\mathcal{Y}}}\big{(}z_{m}|\mathbf{y};\widehat{p}_{m},\mu^{p}_{m},\bm{\theta}\big{)}

z_{m} (t)

z_{m} (t)

μ_{m}^{z} (t)

p_{X_{n} ∣ Y} (x_{n} ∣ y; r_{n}, μ_{n}^{r}, θ)

p_{X_{n} ∣ Y} (x_{n} ∣ y; r_{n}, μ_{n}^{r}, θ)

p_{X} (x_{n}; b) N (x_{n}; r_{n}, μ_{n}^{r})

p_{X} (x_{n}; b) N (x_{n}; r_{n}, μ_{n}^{r})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMillimeter-Wave Propagation and Modeling · Advanced MIMO Systems Optimization · Advanced Wireless Communication Techniques

Full text

Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation with Laplacian Prior

Faouzi Bellili, , Foad Sohrabi, and Wei Yu This paper has been presented in part at IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, Greece, June 2018 [1]. This work is supported in part by an NSERC Postdoctoral Fellowship grant and in part by the Canada Research Chairs program. F. Bellili, F. Sohrabi, and W. Yu are with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mails: [email protected]; [email protected]; [email protected]).

Abstract

This paper tackles the problem of millimeter-Wave (mmWave) channel estimation in massive MIMO communication systems. A new Bayes-optimal channel estimator is derived using recent advances in the approximate belief propagation (BP) Bayesian inference paradigm. By leveraging the inherent sparsity of the mmWave MIMO channel in the angular domain, we recast the underlying channel estimation problem into that of reconstructing a compressible signal from a set of noisy linear measurements. Then, the generalized approximate message passing (GAMP) algorithm is used to find the entries of the unknown mmWave MIMO channel matrix. Unlike all the existing works on the same topic, we model the angular-domain channel coefficients by Laplacian distributed random variables. Further, we establish the closed-form expressions for the various statistical quantities that need to be updated iteratively by GAMP. To render the proposed algorithm fully automated, we also develop an expectation-maximization (EM) based procedure that can be easily embedded within GAMP’s iteration loop in order to learn all the unknown parameters of the underlying Bayesian inference problem. Computer simulations show that the proposed combined EM-GAMP algorithm under a Laplacian prior exhibits improvements both in terms of channel estimation accuracy, achievable rate, and computational complexity as compared to the Gaussian mixture prior that has been advocated in the recent literature. In addition, it is found that the Laplacian prior speeds up the convergence time of GAMP over the entire signal-to-noise ratio (SNR) range.

Index Terms:

Massive MIMO, mmWave, channel estimation, minimum mean-squared error (MMSE) estimator, generalized approximate message passing (GAMP).

I Introduction

Massive multiple-input multiple-output (MIMO) technology in which transceivers are equipped with a large number (tens to hundreds) of antennas has recently attracted considerable research interest both in academia and industry. In fact, owing to the unprecedented spectral and energy efficiency gains it promises, massive MIMO is foreseen to be a key component and to play a major role in future fifth-generation (5G) wireless networks [2, 3, 4]. Moreover, the combination of massive MIMO, mmWave, and small-cell geometries is a symbiotic convergence of technologies, recognized to be the next wireless revolution [5] which enables achieving the approximate thousand-fold increase in capacity that will be needed in the coming decades. A common and crucial assumption behind many of the promised benefits of massive MIMO mmWave technology, however, is that the receiver and/or the transmitter are provided with good-quality channel state information (CSI), which in practice has to be estimated using short-length pilot sequences.

As a matter of fact, the task of acquiring acceptable-quality CSI is much more challenging for massive MIMO systems than it is for traditional MIMO configurations [6]. Indeed, the direct application of conventional per-antenna channel estimation schemes leads to a prohibitive overhead. This stems from the need to transmit long training sequences to be able to accurately estimate all the entries of the large-size channel matrix.

Fortunately, unlike the ultra-high frequency (UHF) or sub-3GHz band, many recent channel measurement/sounding campaigns have confirmed that, due to the lack of scattering in mmWave bands, the signal propagates from the transmitter to the receiver through a small number of path clusters [7, 8, 9]. This has triggered a surge of interest — fueled by recent progresses in compressed sensing (CS) theory — in harnessing the sparsity of mmWave channels to devise accurate estimators that require acceptable-size training sequences [10]. In this context, a plethora of massive MIMO mmWave channel estimators have been introduced, over the recent few years, by capitalizing mainly on the angular-domain sparsity. However, the vast majority of the proposed works capture such sparsity by simply quantizing the beam domain. More specifically, a fixed sampling grid is selected to serve as a possible set for all candidate values of the angles of departure (AoDs) and angles of arrival (AoAs) pertaining to the different paths (see [11, 12] and references therein). Then, by constraining all true (unknown) AoDs/AoAs to lie exactly on the selected grid, the channel estimation problem is recast as a sparse reconstruction problem. Its sensing matrix is an overcomplete dictionary that can be easily constructed by evaluating the transmit/receive array response vectors at all the points of the postulated sampling grid. The sparsity of the unknown vector, which contains the path gain coefficients, follows from the observation that most of the sampled AoD/AoA pairs do not correspond to any physical path111A sampled AoD/AoA pair that does not correspond to any physical path is associated with a zero coefficient with unknown location in $\mathbf{x}$ . The idea of building an overcomplete dictionary via angular-domain quantization was pioneered in [13] within the framework of multisource DOA estimation under flat-fading channels..

Unfortunately, the above techniques suffer from the inevitable off-grid problem which arises in practical situations where some of the true AoDs and/or AoAs do not lie on the sampling grid. For accurate estimation, it is therefore compulsory to use densely sampled grid since it reduces the gap between the true parameters and their nearest points on the grid. However, the cost of a dense grid sampling is the excessive increase in computational complexity.

From an algorithmic point of view, another common theme underlying almost all the existing methods is the use of sparsity-inducing mixed-norm optimization criteria, such as the basis pursuit [14] (and its variants) or the LASSO [15], for reconstruction at the receiver. Although the existing convex optimization-based reconstruction algorithms perform well in practice, their complexity does not scale appropriately with the problem dimensions. Specifically, it increases very rapidly with the size of the transmit/receive antenna arrays as we operate in the massive MIMO regime. To alleviate the computational burden, popular iterative hard/soft thresholding (IHT/IST) algorithms [16, 17] can been envisaged here at the cost, however, of worse reconstruction performance.

In this paper, we capitalize on the generalized approximate message-passing (GAMP) algorithm222Note here that the Bayesian MMSE version of the original AMP algorithm [18] could also be used here and we expect both algorithms to lead to similar results. [19] to estimate the massive MIMO mmWave channel. Unlike the original AMP algorithm, pioneered in [20], GAMP is able to accommodate any priori distribution on the entries of the unknown vector and applies to both linear and nonlinear observation models. GAMP has the reconstruction power of mixed-norm optimization approaches and entails almost the same complexity of IST/IHT algorithms making it very attractive for large-dimensional estimation problems encountered in massive MIMO systems. However, it is known that GAMP diverges with even mildly ill-conditioned sensing matrices [21, 22, 23], like the overcomplete dictionaries resulting from beam-domain quantization. Similar to [24, 25, 26], to circumvent this problem, we rely on the so-called virtual or canonical channel model [27] which amounts to expanding the physical multipath channel in terms of multidimensional Fourier basis functions that are easily implemented by discrete Fourier transform (DFT) matrices. This idea has been recently used together with the sparse message passing paradigm in [28, 29] within the context of massive MIMO mmWave channel estimation.

The use of approximate message-passing algorithms has already found application in many different fields. But its application to the problem of estimating massive MIMO multipath sparse channels has appeared only recently in [24] and [25] for the case of single- and multi-user communications333We also refer the reader to [30, 31] for other interesting recent works that apply the idea of Gaussian message passing to the detection problem in MIMO communications., respectively. In both works, however, GAMP is used in conjunction with a Gaussian mixture (GM) prior on the channel coefficients. This paper makes an observation that the use of a Laplacian prior on the beam-domain coefficients of massive MIMO mmWave channels can lead to better reconstruction performance while speeding up the convergence of GAMP at the same time. The use of a Laplacian prior in our work is, in part, motivated by its wide adoption in Bayesian image processing. There, it has been shown that the approximately sparse discrete cosine transform (DCT) coefficients of natural images are better modeled by a Laplace distribution [32]. Owing to the apparent analogy between the sparsity of DCT coefficients in image processing and the sparsity of mmWave MIMO channels in the DFT basis, it is expected that a Laplacian distribution will lead to enhanced reconstruction performance. In this paper, we establish in closed-from expressions all the statistical quantities required by GAMP under a Laplacian prior. Moreover, we devise a simple approach that learns all the necessary parameters using the expectation-maximization (EM) principle. The proposed EM-based approach comes at almost no additional cost since all the statistical quantities it requires are provided as by-products of GAMP while trying to reconstruct the unknown channel itself. Exhaustive Monte-Carlo simulations show that a Laplacian prior leads indeed to large performance gains especially in the harsh conditions of low SNR and/or reduced number of observations.

We structure the rest of this paper as follows. In section II, we present the system model. In section III, we recast the massive MIMO mmWave channel estimation problem into a sparse reconstruction problem and derive the GAM-based algorithm under a Laplacian prior. In Section IV, we devise the EM-based approach that learns the parameters of the underlying estimation problem. In Section V, we discuss the simulation results of the algorithm. Finally, we draw out some concluding remarks in Section VI.

The common notations used in this paper are as follows. Lower- and upper-case bold fonts, $\mathbf{x}$ and $\mathbf{X}$ , are used denote vectors and matrices, respectively. Upper-case calligraphic font, $\mathcal{X}$ and $\bm{\mathcal{X}}$ , is used to denote single and multivariate random variables, respectively. The $n$ th column of $\mathbf{X}$ is denoted as $[\mathbf{X}]_{:,n}$ , its $(m,n)$ th entry is denoted as $\mathbf{X}_{mn}$ , and the $n$ th element of $\mathbf{x}$ is denoted as $x_{n}$ . $\mathbf{I}_{N}$ stands for the $N\times N$ identity matrix, $\textrm{vec}(\mathbf{X})$ stacks the columns of $\mathbf{X}$ one below the other, and unvec $(.)$ is the associated inverse operator. The shorthand notation $\mathbf{x}\sim\mathcal{N}(\mathbf{m},\mathbf{R})$ means that the vector $\mathbf{x}$ follows a Gaussian distribution with mean $\mathbf{m}$ and auto-covariance matrix $\mathbf{R}$ . Moreover, $\{.\}^{\textsf{T}}$ and $\{.\}^{\textsf{H}}$ stand for the transpose and Hermitian (transpose conjugate) operators, respectively. In addition, $|.|$ and $\|.\|$ stand for the modulus and Euclidean norm, respectively. Given any complex number, $\Re\{.\}$ , $\Im\{.\}$ , and $\{.\}^{*}$ return its real part, imaginary part, and complex conjugate, respectively. The Kronecker function and product are denoted as $\delta_{m,n}$ and $\otimes$ , respectively. We also denote the probability distribution function (pdf) of single and multivariate random variables (RVs) by $p_{\mathcal{X}}(x)$ and $p_{\bm{\mathcal{X}}}(\mathbf{x})$ , respectively. The statistical expectation is denoted as $\mathbb{E}\{.\}$ , $j$ is the imaginary unit (i.e., $j^{2}=-1$ ), and the notation $\triangleq$ is used for definitions.

II System Model

Consider a massive MIMO mmWave communication system wherein the transmitter and the receiver are equipped with $M_{t}$ and $M_{r}$ antenna branches, respectively. At successive discrete time instants $k$ , $k=0,1,\ldots,K-1$ , the $i$ th transmit antenna element sends a training (i.e., pilot) symbol $b_{i}(k)$ . The entire $M_{t}-$ dimensional signal broadcasted by the transmit antenna array is denoted as $\mathbf{b}(k)\triangleq[b_{1}(k),b_{2}(k),\ldots,b_{M_{t}}(k)]^{\textsf{T}}$ . This paper assumes that the elements of the pilot sequence are generated randomly from independent and identically distributed (i.i.d.) complex Gaussian distribution with zero mean and variance $1/\sqrt{M_{t}}$ . Under perfect carrier and timing recovery and assuming block fading, the $M_{r}-$ dimensional received signals, denoted hereafter as $\mathbf{y}(k)$ for $k=0,1,\ldots,K-1$ , can be modeled as follows:

[TABLE]

where $\mathbf{H}\in\mathbb{C}^{M_{r}\times M_{t}}$ is the baseband channel impulse response, assumed to remain constant over the entire observation window, and $\mathbf{w}(k)\triangleq[w_{1}(k),w_{2}(k),\ldots,w_{M_{r}}(k)]^{\textsf{T}}$ is the additive noise vector at discrete time instant $k$ . The noise components, $w_{i}(k)$ , are modeled by circular complex Gaussian random variables with zero mean and variance $2\sigma_{w}^{2}$ and they are assumed to be temporally and spatially white, i.e., $\mathbb{E}\{w_{i}(k)w_{i^{\prime}}(k^{\prime})^{*}\}=2\sigma_{w}^{2}\delta_{i,i^{\prime}}\delta_{k,k^{\prime}}$ .

It is worth mentioning here that the linear model in (1) is valid only when a high precision ADC is used for each antenna. In practice, the ADCs have finite precision especially in mmWave massive MIMO communications where the received data is coarsely quantized [24]. Quantization effects are not considered in this paper since our main goal is to investigate the impact of the prior distribution on the estimation of the sparse beam-domain channel coefficients. We show that the Laplacian prior is the right model for beam-domain mmWave channel coefficients. Extending the results of this work to more practical architectures that use a dedicated RF chain with a low-precision ADC for each antenna is a promising future research direction. Indeed, even in the presence of finite-resolution ADCs, it is still expected that the Laplacian prior will be the right model for the beam-domain coefficients of the mmWave massive MIMO channels. Another attractive hardware architecture for massive MIMO mmWave systems suggests to use a reduced number of RF chains (as compared to the number of antennas) but with full-precision ADCs, known as the hybrid architecture [33, 34]. In this respect, we refer the reader to [12, 35, 36] and references therein for recent works on the problem of channel estimation for massive MIMO mmWave systems with the hybrid architecture. In particular, by evaluating the transmit and receive array steering vectors on a uniform grid of possible AoAs/AoDs, the authors of [12] recast the problem of channel estimation in massive MIMO mmWave systems with hybrid architectures as a compressed sensing problem. The unknown non-zero entries in the exactly sparse vector, $\mathbf{x}$ , of the resulting linear model are the gain coefficients of the individual physical paths. A key step to extending the results of our work to the hybrid architectures amounts to finding the appropriate linear model that has the beam-domain coefficients as the entries of the unknown sparse vector $\mathbf{x}$ . Moreover, since GAMP applies to both linear and non-linear models, our results can potentially be further extended to hybrid mmWave architectures with finite-precision ADCs (i.e., with coarsely quantized observations) by relying on similar arguments in [37].

The goal is to estimate the channel matrix, $\mathbf{H}$ , in (1) given the set of observation vectors $\{\mathbf{y}(k)\}_{k=0}^{K-1}$ . In conventional sub-3GHz MIMO communications, the passing wave tends to hit the receiving antenna array from almost all directions. In this case, every single entry of $\mathbf{H}$ is the aggregate contribution of a large number of propagation paths and, hence, reasonably modeled by a complex Gaussian RV. In higher-frequency mmWave bands, however, electromagnetic waves behave quite differently and radio signals propagate along very few path clusters each containing a small number of sub-paths with small angular spreads. As a result, the mmWave channel is inherently sparse in the angular domain if expressed in suitable DFT bases [26].

To see this, assume that the antenna elements form a uniform linear array (ULA) configuration both at the transmitter and receiver sides. Assume also that there are $P$ clusters and that within each $p$ th cluster ( $p=1,2,\ldots,P$ ) there are $Q_{p}$ sub-paths. Moreover, each $q$ th sub-path within the $p$ th cluster ( $q=1,2,\ldots,Q_{p}$ ) has an angle of departure $\phi_{p,q}$ and impinges on the receive antenna array from an angle of arrival $\theta_{p,q}$ . By the superposition principle, the resulting baseband MIMO channel matrix is given by (see [29] and references therein):

[TABLE]

where $\alpha_{p,q}$ is the gain of the $q^{th}$ sub-path within the $p^{th}$ cluster, $\Omega^{t}_{p,q}=\cos(\phi_{p,q})$ and $\Omega^{r}_{p,q}=\cos(\theta_{p,q})$ are its directional cosine with respect to the transmit and receive antenna arrays, respectively. Moreover, $\mathbf{a}_{t}(\Omega)$ and $\mathbf{a}_{r}(\Omega)$ are, respectively, the transmit and receive array response vectors which are explicitly given by [38]:

[TABLE]

where it is implicitly assumed that the antenna elements are separated by half the wavelength.

The actual channel matrix as expressed in (2) is not visibly sparse. Expressing it in the angular domain, however, reveals that it has indeed very few dominant entries. To see this, consider the following angular-domain representation of $\mathbf{H}$ :

[TABLE]

where $\mathbf{U}_{r}$ and $\mathbf{U}_{t}$ are the $M_{r}\!\times\!M_{r}$ and $M_{t}\!\times\!M_{t}$ spatial unitary DFT matrices. Plugging (5) back into (2) and rearranging the terms, it follows that:

[TABLE]

Clearly, in (6), the columns of $\mathbf{U}_{r}$ and $\mathbf{U}_{t}$ act as receive and transmit beamforming vectors, respectively, capturing how much energy is present along their associated transmit/receive beams. In fact, by considering the $m^{th}$ component of the vector $\mathbf{v}_{p,q}^{r}\leavevmode\nobreak\ \!\triangleq\leavevmode\nobreak\ \!\mathbf{U}_{r}^{\textsf{H}}\mathbf{a}_{r}(\Omega^{r}_{p,q})$ :

[TABLE]

it can be shown that its magnitude is explicitly expressed as follows:

[TABLE]

It is clear that $\big{|}\mathbf{v}_{p,q}^{r}[m]\big{|}$ is maximal for $m_{0}$ verifying:

[TABLE]

This is in line with a standard result in array signal processing which states that each of the receive beamforming vectors $[\mathbf{U}_{r}]_{:,m}$ , for $m=1,2,\ldots,M_{r}$ , has a main lobe centered around $(m-1)/M_{r}\Delta_{r}$ with beamwidth $1/(M_{r}\Delta_{r})$ where $\Delta_{r}$ is the inter-antenna separation normalized by the wavelength (in our case $\Delta_{r}=1/2$ ). Therefore, a given subpath with receive directional cosine $\Omega^{r}_{p,q}$ has almost all of its energy along one particular vector $[\mathbf{U}_{r}]_{:,m_{0}}$ (see (9)) and very little along all the others. If the angular spread, $\sigma_{\theta}^{r}$ , is small (typically $\sigma_{\theta}^{r}<1/M_{r}\Delta_{r}$ ), then all the sub-paths belonging to the same cluster have most of their energy along the same beamforming vector. By the same virtue, they also have most of their energy concentrated along one particular transmit beamforming vector $[\mathbf{U}_{t}]_{:,n_{0}}$ for some $n_{0}\in\{1,2,\ldots,M_{t}-1\}$ that satisfies:

[TABLE]

In this way, the only few dominant entries, $\widetilde{\mathbf{H}}_{m_{0}n_{0}}$ , of $\widetilde{\mathbf{H}}$ are those for which there is a cluster with mean AoA and mean AoD that verify (9) and (10) at the same time. All the remaining entries have relatively small magnitude; they are not identically zero as they capture small contributions from all the clusters due to spectral leakage phenomena.

III Proposed mmWave channel estimation algorithm

III-A Problem formulation

To harness the angular domain sparsity of the channel, we propose to apply the DFT precoder, $\mathbf{U}_{t}^{\textsf{H}}$ , to the training sequences before transmission, i.e., the transmit array sends $\mathbf{U}_{t}\mathbf{b}(k)$ at discrete time instants $k$ , $k=0,1,\ldots,K-1$ . We also combine the corresponding received noisy vector using the DFT combiner, $\mathbf{U}_{r}$ , such that:

[TABLE]

where $\widetilde{\mathbf{w}}(k)\triangleq\mathbf{U}_{r}^{\textsf{H}}\mathbf{w}(k)$ is the resulting combined noise which has exactly the same statistics as $\mathbf{w}(k)$ since the matrix $\mathbf{U}_{r}$ is unitary. By stacking all the received vectors in a single matrix $\widetilde{\mathbf{Y}}=[\widetilde{\mathbf{y}}(0),\widetilde{\mathbf{y}}(1),\ldots,\widetilde{\mathbf{y}}(K-1)]$ , we obtain:

[TABLE]

with the matrices $\mathbf{B}$ and $\widetilde{\mathbf{W}}$ being constructed in the same way as $\mathbf{Y}$ , i.e., $\mathbf{B}=[\mathbf{b}(0),\mathbf{b}(1),\ldots,\mathbf{b}(K-1)]$ and $\widetilde{\mathbf{W}}=[\widetilde{\mathbf{w}}(0),\widetilde{\mathbf{w}}(1),\ldots,\widetilde{\mathbf{w}}(K-1)]$ . Now, vectorizing (12) yields:

[TABLE]

By defining $\widetilde{\mathbf{y}}\triangleq\textrm{vec}(\widetilde{\mathbf{Y}})\in\mathbb{C}^{M_{r}K}$ , $\widetilde{\mathbf{x}}\triangleq\textrm{vec}(\widetilde{\mathbf{H}})\in\mathbb{C}^{M_{r}M_{t}}$ , $\widetilde{\mathbf{w}}\triangleq\textrm{vec}(\widetilde{\mathbf{W}})\in\mathbb{C}^{M_{r}K}$ , and $\widetilde{\mathbf{A}}\triangleq\mathbf{B}^{\textsf{T}}\otimes\mathbf{I}_{M_{r}}\in\mathbb{C}^{M_{r}K\times M_{r}M_{t}}$ , (13) is equivalently rewritten as follows:

[TABLE]

Recall here that the vector $\widetilde{\mathbf{x}}$ is approximately sparse due to the approximate sparsity of $\widetilde{\mathbf{H}}$ . This paper captures the underlying sparsity by a Laplacian distribution. Since the Laplacian distribution is defined for real-valued RVs only, we transform the complex model in (14) to the following equivalent real model:

[TABLE]

We recognize in (III-A) the well-known inverse problem in signal processing research practices: reconstruct a (approximately) sparse vector from the fewest possible number of noisy linear observations:

[TABLE]

in which by defining $M=2M_{r}K$ and $N=2M_{r}M_{t}$ , we have $\mathbf{y}\in\mathbb{R}^{N}$ , $\mathbf{A}\in\mathbb{C}^{M\times N}$ , and $\mathbf{x}\in\mathbb{C}^{N}$ . Once this is done, an estimate of the original channel matrix is readily obtained from (6) as follows:

[TABLE]

where $\widehat{\widetilde{\mathbf{H}}}=\textrm{unvec}\big{(}\widehat{\widetilde{\mathbf{x}}}\big{)}$ and $\widehat{\widetilde{\mathbf{x}}}$ is an estimate of $\widetilde{\mathbf{x}}$ which is easily obtained from the reconstructed vector $\widehat{\mathbf{x}}$ .

III-B Modeling the Angular-Domain Coefficients of mmWave Channels:

In our quest for finding the beam-domain channel coefficients in $\mathbf{x}$ , we follow the Bayesian approach which requires an appropriate model for the prior distribution of the $x_{n}$ ’s. For rich-scattering environments, an accurate and widely used statistical model for the actual channel coefficients is the Gaussian model. In other words, the various entries of the channel matrix $\mathbf{H}$ involved in (1) are assumed to follow a scaled normal distribution:

[TABLE]

in which $\beta$ is the large-scale fading coefficient that accounts for the combined effects of shadowing and path loss. In this case, the entries of $\widetilde{\mathbf{H}}$ in (12) or equivalently the elements of the unknown vector $\mathbf{x}$ in (16) are also Gaussian-distributed since $\mathbf{U}_{r}$ and $\mathbf{U}_{t}$ are unitary matrices.

In mmWave communications, however, the entries of $\mathbf{H}$ cannot be approximated by a Gaussian distribution due to the lack of scattering. Hence, a more appropriate statistical model for the angular-domain channel coefficients needs to be specified. This problem has been addressed in two recent works [24] and [25] for single- and multi-user communications, respectively, and both of these works propose to approximate the unknown prior distribution by a Gaussian mixture (GM) model of order $L$ , i.e.:

[TABLE]

where $\{\omega_{l}\}_{l=1}^{L}$ are the normalized mixing coefficients in the postulated GM model that is parameterized by the vector $\bm{\alpha}\triangleq[\omega_{1},\ldots\omega_{L},\eta_{1},\ldots,\eta_{L},\nu_{1},\ldots,\nu_{L}]^{\textsf{T}}$ . In [24, 25], the components of $\bm{\alpha}$ are estimated by the expectation-maximization procedure for the GM model introduced recently in [39].

In this paper, we propose to model the angular-domain coefficients of the underlying mmWave massive MIMO channel by a zero-mean Laplacian distribution with scale parameter $b$ :

[TABLE]

As mentioned previously, our choice is motivated by the widespread use of the Laplacian distribution to capture the sparsity of DCT coefficients of natural images [32]. Moreover, it can be shown that MAP-based estimation of sparse signals with Laplacian prior is equivalent to the regularized $l_{1}-$ norm minimization problem which is known to promote sparsity. Indeed, as explained in [40, 41], the Laplacian prior enforces the sparsity constraint more heavily by distributing the posterior mass more on the axes so that signal coefficients close to zero are preferred.

The superiority of the Laplacian prior over the GM prior is intuitively expected. Indeed, both priors are special cases of the so-called generalized Gaussian scale model (GSM) which yields a richer class of priors. To see this, consider a zero-mean RV $\mathcal{X}$ which is given by:

[TABLE]

and ${\mathcal{Z}}$ being a positive random variable which is statistically independent from $U$ . Hence:

[TABLE]

By closely inspecting (23), it can be shown that when ${\mathcal{Z}}$ is a discrete-valued RV with finite support $p_{\mathcal{X}}(x)$ reduces to a finite mixture of Gaussian densities. Then, a more elaborate model would be to use an uncountably infinite mixture (i.e., integral) of Gaussian densities by considering a continuous-valued RV ${\mathcal{Z}}$ . One particular choice that is appealing from the computational point of view is to use an exponential distribution for ${\mathcal{Z}}$ . By doing so, one actually recovers the Laplacian prior that we are using in our paper. It becomes clear then that the Laplacian prior would lead to better performance than the finite GM model unless one is willing to use a GM model that mixes a large number of normal distributions whose parameters become very difficult to learn in practice. Indeed, by using a Laplacian prior, one is actually using a GM model which involves the sum of uncountably infinite number of Gaussian distributions while requiring to learn only one parameter. By learning this parameter, we are implicitly learning the mixing coefficients along with the variances involved in the infinite-sum (i.e., integral) of the GSM model; instead of learning the individual parameters of the finite GM model as done in previous works.

We emphasize here the fact that, unlike the GM model in (19), the Laplacian distribution in (20) is parameterized by a single parameter $b$ which is itself unknown in advance. Later in this Section, we show how it can also be learned adaptively using the EM principle by exploiting the auxiliary outputs of GAMP. It is noteworthy, however, that learning a single parameter (namely $b$ ) is statistically more efficient and entails much less computational complexity than learning $3L$ parameters under the GM model. Note as well that the large-scale fading coefficient is also assumed to be unknown and absorbed in the scale parameter $b$ .

It is also noteworthy that the original AMP algorithm [20] implicitly uses a Laplacian prior on the components of the unknown sparse vector $\mathbf{x}$ . However, the denoiser in the original AMP paper is not designed specifically for this Laplacian prior. Instead the denoiser design is based on the minimax criterion, which results in soft/hard thresholding of the components of $\mathbf{x}$ . It is the generalized AMP (GAMP) algorithm that offers a systematic way of taking the Laplacian (and actually any) prior into account during the denoising step, as will be explained in the next section.

Note also that in this paper we do not take into account the correlation between the angular-domain coefficients of mmWave massive MIMO channels. Indeed, the significant elements (in the angular domain) of mmWave massive MIMO channels usually appear in bursts due to the physical scattering structure. In order to capture that correlation, one needs to find an appropriate prior, $p_{\bm{\mathcal{X}}}(\mathbf{x})$ , for the entire vector $\mathbf{x}$ and then estimate the latter using vector (instead of scalar) GAMP. This requires, however, the inversion of multiple large-size matrices at every iteration on the top of learning the entire covariance matrix of $\mathbf{x}$ using the EM algorithm instead of learning a single scale parameter as done in this paper.

III-C MmWave Channel Estimation using GAMP with Laplacian Prior

GAMP algorithm applies loopy belief propagation on the bipartite graph obtained from (16) under Gaussian approximations for the involved messages which become accurate in the large system limit. It falls under the Bayesian estimation framework wherein by assuming a prior distribution, $p_{\bm{\mathcal{X}}}(\mathbf{x})$ on $\mathbf{x}$ one is interested in finding the marginal posterior distributions $p_{\mathcal{X}_{n}|\bm{\mathcal{Y}}}(x_{n}|\mathbf{y})$ . If possible, these could be used to perform minimum mean square error (MMSE) or maximum a posteriori estimation of each $x_{n}$ separately:

[TABLE]

Unfortunately, finding the true marginal distributions, $p_{\mathcal{X}_{n}|\bm{\mathcal{Y}}}(x_{n}|\mathbf{y})$ , is analytically intractable and computationally prohibitive. To sidestep this problem, GAMP implements loopy belief propagation and relies on the central limit theorem (CLT) and quadratic approximations to solve the MMSE and MAP estimation problems, respectively. Specifically, the sum-product and max-sum BP algorithms are used in the former and latter cases, respectively. In the sequel, we focus on MMSE estimation wherein GAMP approximates $p_{\mathcal{X}_{n}|\bm{\mathcal{Y}}}(x_{n}|\mathbf{y})$ by another tractable distribution that is progressively refined from one iteration to another. In fact, assume that the components, $\{x_{n}\}_{n=1}^{N}$ , of the unknown vector $\mathbf{x}$ are independent444Note here that the real and imaginary parts of each angular-domain channel coefficient can be dependent. In principle, this dependence can be captured by a bivariate Laplacian distribution. However, the latter does not enable one to find analytical expressions for the posterior mean and variance involved in Lines 15 and 16 of Algorithm 1, respectively. and identically distributed (i.i.d.) according to a common prior distribution, $p_{\mathcal{X}}(x;\bm{\alpha})$ , which is parameterized by an unknown parameter vector $\bm{\alpha}$ . After being linearly transformed to produce $\mathbf{z}\triangleq\mathbf{A}\mathbf{x}$ , the latter is propagated through a probabilistic channel

[TABLE]

Although both $\bm{\alpha}$ and $\sigma_{w}^{2}$ need to be estimated as well (cf. Section IV) assume here and in the next subsection that they are perfectly known to the receiver and gather them in a single parameter vector $\bm{\theta}\triangleq[\bm{\alpha},\sigma_{w}^{2}]$ . Given knowledge of $\mathbf{y}$ , $\mathbf{A}$ , and $\bm{\theta}$ , GAMP runs iteratively according to the algorithmic description provided in Algorithm 1.

First, observe that the message propagated along the weak edge from the factor node $p(z_{m}|\mathbf{a}_{m}^{\textsf{T}}\mathbf{x})$ to the variable node $z_{m}$ is computed by integrating over all the variable nodes $\{x_{n}\}_{n}$ in $\mathbf{x}$ (here $\mathbf{a}_{m}^{\textsf{T}}$ is the $m$ th row of $\mathbf{A}$ ). Therefore, using the CLT argument — due to integration over a large number of variables — GAMP approximates this message by a Gaussian distribution with mean $\widehat{p}_{m}$ and variance $\mu^{p}_{m}$ (obtained from lines 7 and 8 of Algorithm 1), i.e., $\mathcal{N}(z_{m};\widehat{p}_{m},\mu^{p}_{m})$ . By taking the product of both incoming messages at node $z_{m}$ , i.e., those outgoing from factor nodes $p(z_{m}|\mathbf{a}_{m}^{\textsf{T}}\mathbf{x})$ and $p_{\mathcal{Y}_{m}|\mathcal{Z}_{m}}(y_{m}|z_{m};\bm{\theta})$ , the marginal posterior $p_{\mathcal{Z}_{m}|\bm{\mathcal{Y}}}(z_{m}|\mathbf{y};\bm{\theta})$ can be approximated by:

[TABLE]

It is seen that (28) holds irrespectively of the prior distribution $p_{\mathcal{X}}(x;.)$ , therefore, the posterior mean and variance (in lines 9 and 10) of $\mathcal{Z}_{m}$ under the Laplacian distribution are the same as for the GM model and their expressions are readily available from [19] as:

[TABLE]

The quantities $\widehat{s}_{m}(t)$ and $\mu_{m}^{s}(t)$ updated in lines 12 and 11 of algorithm 1, respectively, are the equivalent of the Onsager correction term in the original AMP algorithm [20]. They follow directly from the quadratic approximations involved in the general theory of GAMP.

The message propagated along the strong edges from all the factor nodes $\{p(z_{m}|\mathbf{a}_{m}^{\textsf{T}}\mathbf{x})\}_{m}$ to the variable node $x_{m}$ is computed by integrating over all the other variable nodes, $\{x_{n}^{\prime}\}_{n^{\prime}\neq n}$ , in $\mathbf{x}$ and all the $z_{m}$ ’s in $\mathbf{z}=\mathbf{A}\mathbf{x}$ . Therefore, using the CLT argument again, GAMP approximates this message by a Gaussian distribution with mean $\widehat{r}_{m}$ and variance $\mu^{r}_{m}$ (updated in lines 14 and 13 of Algorithm 1), i.e., $\mathcal{N}(x_{m};\widehat{r}_{m},\mu^{r}_{m})$ . Therefore, the marginal posterior $p_{\mathcal{X}|\bm{\mathcal{Y}}}(x|\mathbf{y};\bm{\theta})$ at iteration $t$ can be approximated by taking the product of the aforementioned incoming Gaussian message, $\mathcal{N}(x_{m};\widehat{r}_{m},\mu^{r}_{m})$ , and any (common) prior distribution, $p_{\mathcal{X}}(x_{n};\bm{\alpha})$ , on the components of $\mathbf{x}$ :

[TABLE]

Using the Laplace distribution, $p_{\mathcal{X}}(x_{n};b)$ , as in (20) we first establish in Appendix A the following result:

[TABLE]

where the $(\widehat{r}_{n},\mu^{r}_{n},b)-$ dependent functions $\alpha_{n}(x)$ and $\gamma_{n}(x)$ are given by:

[TABLE]

In (33) and (34), $\operatorname{sgn}(x)$ is the standard signum function given by:

[TABLE]

Plugging (32) back into (31), it follows that:

[TABLE]

in which the normalization factor $\psi_{n}$ is given by:

[TABLE]

Using (33) to (35), we show after some algebraic manipulations that:

[TABLE]

wherein $\textsf{Q}(.)$ is the standard $Q$ -function, i.e., the tail probability of the standard normal distribution:

[TABLE]

and the $(\widehat{r}_{n},\mu^{r}_{n},b)-$ dependent quantities, $\alpha_{n}^{-}$ , $\alpha_{n}^{+}$ , $\gamma_{n}^{-}$ , and $\gamma_{n}^{+}$ are given by:

[TABLE]

Now, the posterior mean involved in Line 15 of Algorithm 1 is given by555Note that the dependence of $\widehat{r}_{n}$ and $\mu_{n}^{r}$ on the iteration index $t$ will be dropped from now on to ease notations.:

[TABLE]

and owing to (III-C), it follows that:

[TABLE]

Then, by using (33)-(34) along with (40)-(43) and resorting to some algebraic manipulations, we establish the following result:

[TABLE]

in which $\Phi_{1}(\gamma,\mu)$ is defined as follows:

[TABLE]

Moreover, it can be shown that $\Phi_{1}\big{(}\gamma,\mu\big{)}$ is analytically expressed as follows:

[TABLE]

Injecting (48) in (46) an rearranging the terms, it is straightforward to show that:

[TABLE]

We also establish the following identity by combining (40)-(43):

[TABLE]

This cancels the last two terms in (49) thereby leading to the following simple expression for the posterior mean of $\mathcal{X}_{n}$ :

[TABLE]

Now, the posterior variance in Line 16 of Algorithm 1 is given by:

[TABLE]

where $\sigma_{\mathcal{X}_{n}}^{2}\!(t+1)$ is the posterior second moment of $\mathcal{X}_{n}$ at iteration $t+1$ :

[TABLE]

Using the posterior distribution in (III-C), $\sigma_{\mathcal{X}_{n}}^{2}\!(t+1)$ is given by:

[TABLE]

whose analytical expression is also established in Appendix B as follows:

[TABLE]

Plugging (55) back in (52) and using (III-C) yields the required $(t+1)th$ update for the $n$ th posterior variance, $\mu^{x}_{n}(t+1)$ , in Algorithm 1.

Recall also that the quantities $\alpha_{n}^{-}$ , $\alpha_{n}^{+}$ , $\gamma_{n}^{-}$ , and $\gamma_{n}^{+}$ that are required to evaluate both $\widehat{x}_{n}(t+1)$ and $\mu^{x}_{n}(t+1)$ all depend on the scale parameter $b$ of the prior Laplacian distribution and the noise variance $\sigma_{w}^{2}$ . Since these two parameters are also unknown beforehand, in the next Section, we devise an EM-based maximum-likelihood (ML) approach that learns them from the received data $\mathbf{y}$ . The proposed iterative ML approach runs side by side with GAMP wherein the soft outputs of the latter are used to progressively refine the EM-based estimates for $b$ and $\sigma_{w}^{2}$ and vice versa.

IV Learning the Scale Parameter of the Prior Laplacian Distribution and the Noise Variance

The problem addressed in this Section is to find the ML estimate, $\widehat{\bm{\theta}}$ , of $\bm{\theta}\triangleq[b,\sigma_{w}]$ given solely the set of recorded data:

[TABLE]

where $p_{\bm{\mathcal{Y}}}(\mathbf{y};\bm{\theta})$ is the pdf of $\bm{\mathcal{Y}}$ parameterized by $\bm{\theta}$ given by:

[TABLE]

The objective function in (56) is a nonlinear transformation of the parameters and its analytical maximization is mathematically intractable. Yet, iterative solutions can be envisaged to solve the underlying optimization problem numerically. In this paper, we resort to the EM concept [42] which is a widely used tool in ML vector parameter estimation practices. More interestingly, GAMP returns the adequate posterior probabilities that are required by the EM algorithm in order to update its estimates. The successful formulation of the EM algorithm amounts to the appropriate identification of the so-called incomplete and complete data sets that are adequate to the estimation problem at hand. In our case, they are taken to be $\mathbf{y}$ and $\mathbf{v}\triangleq[\mathbf{y},\mathbf{x}]^{T}$ , respectively. Then, instead of maximizing the actual log-likelihood function (LLF), $\ln p_{\bm{\mathcal{Y}}}(\mathbf{y};\bm{\theta})$ , the EM algorithm maximizes $\ln p_{\bm{\mathcal{V}}}(\mathbf{v};\bm{\theta})$ . Since the set of complete data, $\mathbf{v}$ , is not entirely available, the EM algorithm replaces $\ln p_{\bm{\mathcal{V}}}(\mathbf{v};\bm{\theta})$ by its conditional expectation:

[TABLE]

However, since one needs to know $\bm{\theta}$ in order to determine $p_{\bm{\mathcal{V}}|\bm{\mathcal{Y}}}(\mathbf{v}|\mathbf{y};\bm{\theta})$ and hence the expected LLF in (IV), the EM algorithm proceeds in the following iterative way. We start with an initial guess, $\widehat{\bm{\theta}}^{(0)}$ , about the unknown parameter vector $\bm{\theta}$ , and we let $\widehat{\bm{\theta}}_{k}$ be the $k^{th}$ guess of its MLE. Then, as the name suggests, the expectation-maximization algorithm alternates between the following two main steps:

$\bullet$ Expectation step (E-STEP): Find the average log-likelihood of the complete data:

[TABLE]

$\bullet$ Maximization step (M-STEP): Maximize the average log-likelihood of the complete data:

[TABLE]

The algorithm stops once the user-specified convergence criterion $|\widehat{\bm{\theta}}_{k+1}-\widehat{\bm{\theta}}_{k}|\leq\delta$ is met or a predefined maximum number of iterations is attained; whichever occurs first.

Now, recalling that $\mathbf{v}\triangleq[\mathbf{y},\mathbf{x}]^{T}$ and using the Bayes’ rule, we write:

[TABLE]

Plugging (61) back into (59) and using:

[TABLE]

it can be shown that:

[TABLE]

As seen from (63), $Q(\bm{\theta},\widehat{\bm{\theta}}_{k})$ decouples in terms of the unknown parameters $\sigma_{w}$ and $b$ . Therefore, the $k^{tk}$ EM update $\widehat{\bm{\theta}}_{k+1}=[\widehat{\sigma}^{2}_{w,k+1},\widehat{b}_{k+1}]$ of $\bm{\theta}=[\sigma_{w}^{2}\leavevmode\nobreak\ b]$ given in (60) is obtained as follows:

[TABLE]

We further assume the elements of $\mathbf{x}$ to be i.i.d.:

[TABLE]

Hence, we rewrite (64) as follows:

[TABLE]

Recall from (63) that the expectation in (67) is taken with respect to the posterior density $p_{\bm{\mathcal{X}}|\bm{\mathcal{Y}}}\big{(}\mathbf{x}|\mathbf{y};\widehat{\bm{\theta}}_{k}\big{)}$ which we further approximate by:

[TABLE]

Using (68) in (67), it follows that:

[TABLE]

Clearly, $\widehat{b}_{k+1}$ is the value of $b$ that zeros the first derivative of $g(b|\mathbf{y},\widehat{\bm{\theta}}_{k})$ with respect to $b$ :

[TABLE]

It is also easy to show the following result for the Laplacian distribution given in (20):

[TABLE]

which is injected back into (IV) to obtain:

[TABLE]

Setting (72) equal to zero and solving for $b$ yields the following EM update for $b$ :

[TABLE]

The posterior first-order moment in (73) for each $x_{n}$ is computed using the associated posterior distribution $p_{\mathcal{X}_{n}|\bm{\mathcal{Y}}}\big{(}x_{n}|\mathbf{y};\widehat{\bm{\theta}}_{k}\big{)}$ . The latter is readily obtained from the auxiliary outputs of GAMP according to (III-C) wherein the true parameter vector $\bm{\theta}$ is now replaced by its previous EM update $\widehat{\bm{\theta}}_{k}$ . More specifically, we have:

[TABLE]

where the functions $\alpha_{n,k}(x)$ and $\gamma_{n,k}(x)$ are obtained from (33) and (34), respectively, with $b$ being replaced by $\widehat{b}_{k}$ . Likewise, $\psi_{n,k}$ has the same expression as in (38); the only difference being that $b$ is replaced by $\widehat{b}_{k}$ in all the $b-$ dependent quantities $\alpha_{n}^{-}$ , $\alpha_{n}^{+}$ , $\gamma_{n}^{-}$ , and $\gamma_{n}^{+}$ . For convenience, we will also from now on denote these quantities as $\alpha_{n,k}^{-}$ , $\alpha_{n,k}^{+}$ , $\gamma_{n,k}^{-}$ , and $\gamma_{n,k}^{+}$ , respectively. Now, using the EM posterior distribution in (74), it can be further shown that:

[TABLE]

Then, after using (48) and (50) in (75), and recalling (73), we establish the EM update for the scale parameter, $b$ , as follows:

[TABLE]

in which $\xi_{n,k}$ and $\xi_{n,k}^{\prime}$ are explicitly given by:

[TABLE]

In addition, since the original complex noise components in (14) are assumed to be independent and modeled by circularly-symmetric Gaussian RVs with variance $2\sigma_{w}^{2}$ , then the real-valued noise components, $w_{m}$ , in (16) are also independent with $w_{m}\sim\mathcal{N}(w_{m};0,\sigma_{w}^{2})$ . Therefore, after dropping the constant term that does not depend on $\sigma_{w}^{2}$ , we have:

[TABLE]

where $z_{m}$ is the $m$ th component of $\mathbf{z}=\mathbf{Ax}$ . Finding the partial derivative of (IV) with respect to $\sigma_{w}^{2}$ , plugging it back in (65), then taking the expectation with respect to the posterior in (68), setting the result to zero, and solving for $\sigma_{w}^{2}$ , yields the following EM update for the noise variance:

[TABLE]

which is equivalent to the result obtained earlier in [39] under the GM model and complex data. Note also that a more general prior such as the elastic net prior can be used to model the beam-domain coefficients of massive MIMO mmWave channels. The corresponding GAMP updates have been recently established in [43] within the framework of Bayesian logistic regression problems.

V Simulation Results

In this Section, we assess the performance of the proposed massive MIMO mmWave channel estimator using exhaustive Monte-Carlo simulations and the normalized mean-square error (NMSE) as a performance measure:

[TABLE]

For each $p$ th cluster in (2), the gain of the corresponding $q$ th sub-path is drawn independently from a zero-mean complex Gaussian distribution of variance $\sigma_{p,q}^{2}$ . The latter obey the constraint $\sum_{q=1}^{Q_{p}}\sigma_{p,q}^{2}=\sigma_{p}^{2}$ which is the power fraction pertaining to the $p$ th path cluster. We further normalize the total channel power by enforcing $\sum_{p=1}^{P}\sigma_{p}^{2}=1$ as well as the transmit power by enforcing $\mathbb{E}\{\|\mathbf{b}(k)\|^{2}\}=1$ . Under these normalizations, it can be verified from (11b) that the SNR is simply given:

[TABLE]

The AoDs (resp. AoAs) pertaining to each $p$ th cluster are also drawn independently around the associated mean angle of departure (resp. arrival) with angular spread equal to 3.5 degrees. The mean angles of departure/arrival of the different path clusters are generated independently and uniformly at random in $[0,\pi]$ .

Later on in this section, we also assess the performance of the proposed algorithm in terms of the achievable rate. As a non-Bayesian baseline, we consider the ML estimator (when $\mathbf{A}$ is full column rank) which boils down to the conventional LS estimator due to the linearity of the model in (16) and the Gaussianity of the noise:

[TABLE]

From the Bayesian family, we also consider the GAMP-based estimator under the GM model recently investigated in [24] (and we refer to it simply as GAMP-GM), as well as, the well-known linear minimum mean-square error (LMMSE) estimator. We also consider uniform linear arrays consisting of $M_{t}=64$ and $M_{r}=64$ antenna elements both at the transmitter and receiver sides. As a representative example, the multipath channel consists of $P=4$ clusters each of which containing $Q_{p}=10$ sub-paths with the same angular spread of 3.5 degrees. The results reported in this section are obtained using 500 Monte-Carlo trials. We also allow a maximum number of iterations $T_{\textrm{max}}=50$ for GAMP if it does not converge given the precision tolerance $\epsilon=10^{-6}$ (cf. algorithm 1).

In all simulations, the EM algorithm was initialized as in [39] for GAMP-GM. More specifically, the noise variance was initialized as follows:

[TABLE]

where $M=2M_{r}K$ and $\textsf{SNR}^{0}=100$ . Moreover, the parameters of the GM prior were initialized as follows for $l=1,\ldots,L$ :

[TABLE]

in which $\|.\|_{F}$ stands for the Frobenius norm. For EM-GAMP-Laplace, the noise variance was initialized as in (82) above and the scale parameter, $b$ , of the Laplacian prior was initialized to $\widehat{b}_{0}=1$ .

Note also that the overall stopping conditions of EM-GAMP are those corresponding to the outer iteration loop (i.e., GAMP iteration loop) shown in Line 18 of algorithm 1. The EM algorithm has its own stopping conditions under each $t^{th}$ GAMP iteration. When the stopping conditions for the EM algorithm are met, the resulting EM updates for the noise variance and the parameters of the prior distribution are used by GAMP during its $(t+1)$ th iteration which is incremented in Line 17 of Algorithm 1.

To start, we illustrate the behavior of the proposed GAMP-Laplace estimator in Fig. 1. We plot a specific realization for the true angular-domain channel’s magnitude and its estimate as returned by GAMP-Laplace at $\textrm{SNR}=10$ dB. As seen from Fig. 1(a), the true channel is indeed approximately sparse in the angular domain, i.e., it exhibits a very small number of dominant coefficients. The effect of spectral leakage is also clearly observed as the magnitudes of the remaining coefficients become smaller as we move away from the location of the subpath clusters. As seen from Fig. 1(b), GAMP-Laplace is able to estimate the mmWave channel with high accuracy at a moderate SNR threshold.

Fig. 2 depicts the NMSE performance of the three considered estimators as function of the SNR for two different values of the observation window size, namely $K=64$ and $K=128$ . First, as expected it is seen from Fig. 2(a) that the performance of the ML estimator is very poor since $K=64$ corresponds to the case where the number of unknowns $M_{t}M_{r}$ is equal to the number of observations $KM_{r}$ . In fact, it is widely known that ML estimation, under linear mixing, requires more observations than unknowns. This is confirmed by Fig 2(b) in which we double the number of observations. In the latter case, the advantage of the two Bayesian approaches over ML estimation in low-to-moderate SNRs is due to their ability to exploit the prior information about the unknown channel instead of simply assuming it to be unknown but deterministic as is the case with ML estimation. It is also seen that in both cases GAMP-Laplace offers remarkable performance gains over GAMP-GM. For instance, at target $\textrm{NMSE}=-10$ dB, the gains in terms of SNR are as high as 10 dB and 5 dB for $K=32$ and $K=64$ , respectively. As we shall see later, this translates to large gains in terms of achievable rate.

We also plot in Fig. 3, the average number of iterations required by GAMP (until convergence) under both GM and Laplace priors. There, it is seen that GAMP-Laplace converges much faster than GAMP-GM due to more accurate modeling of the prior distribution of the angular-domain channel coefficients. For instance, at $\textrm{SNR}=10$ dB, GAMP-Laplace converges in almost 25 iterations when $K=64$ as opposed to 35 iterations for GAMP-GM thereby leading to tremendous computational savings in practice. Recall here that GAMP performs four matrix/vector multiplications at each iteration (cf. Algorithm 1 which is actually a scalarized version of GAMP). Hence, GAMP-Laplace saves on average 40 matrix/vector multiplications over GAMP-GM. This is to be added to the computational savings stemming from the fact that GAMP-Laplace needs to learn only one parameter (namely the scale parameter $b$ ), under each iteration, as opposed to $3L$ different parameters for GAMP-GM where $L$ is the order of the underlying Gaussian mixture.

Next, we focus on the data decoding phase and investigate the impact of channel estimation accuracy on the mutual information or equivalently the achievable rate. Our approach is inspired by that in [24, 26]. We assume that the channel estimate acquired at the receiver is provided to the transmitter via feedback links, or that the mmWave system is operating in time-division-duplex (TDD) mode with reciprocal channel which is estimated directly at the transmitter via the reverse link and used for beamforming/multiplexing purposes on the forward link. More specifically, if the estimated channel matrix, $\widehat{\mathbf{H}}$ , is made available to the transmitter, the latter performs its singular-value decomposition (SVD):

[TABLE]

Then, $\widehat{\mathbf{V}}$ is used to produce the precoded signal:

[TABLE]

in which $\mathbf{a}$ is the information-bearing symbol whose components contain independently coded data streams and $M^{*}=\textrm{min}(M_{t},M_{r})$ . Moreover, $\{P_{m}\}_{m=1}^{M^{*}}$ are the set of optimal powers allocated across the various data streams as provided by the well-known water-filling algorithm. Actually, due to the inherent cluster-based model in (2), the underlying mmWave channel matrix is rank deficient and hence the number (say $S$ ) of independent data streams it can support is much smaller than $M^{*}$ . This is corroborated by the results reported in Fig. 4 which depicts the output of the water-filling algorithm when applied to the channel estimate $\widehat{\mathbf{H}}$ that is provided by GAMP-Laplace at $\textrm{SNR}=10$ dB. There, it is seen that nearly 40 out of the 64 available data streams are activated by the waterfilling algorithm. This is in line with the fact that the true channel matrix $\mathbf{H}$ is of rank 40 since it embodies 4 clusters each of which consisting of 10 subpaths. The fact that the activated data streams in Fig. 4 slightly exceeds the rank of $\mathbf{H}$ is due to the channel estimation error induced by GAMP-Laplace.

The received signal, $\mathbf{y}=\mathbf{H}\mathbf{s}+\mathbf{w}$ , is also pre-processed by $\widehat{\mathbf{U}}$ . Under perfect CSI, this yields $S$ independent parallel channels which can be decoded separately without loss of optimality. Channel estimation errors, however, introduce inter-stream interference which is independent of the background noise. By treating such interference as one more additive noise term while still decoding the streams separately, the mutual information between $\mathbf{s}$ and $\mathbf{y}$ given $\mathbf{H}$ is lower bounded666This is indeed a lower bound since it assumes a worst-case distribution for the interference components which is the Gaussian distribution. by [26]:

[TABLE]

where $\textrm{SINR}_{m}$ is the signal-to-noise-plus-interference ratio (SINR) pertaining to the $m$ th data streams:

[TABLE]

Note here that for convenience we incorporate both active and non-active streams in (86) since the latter do not change the final result as their allocated powers are equal to zero (cf. Fig. 4).

Fig. 5 depicts the lower-bound mutual information (86) achieved by the three estimators for two different training window sizes (namely, $K=64$ and $K=128$ ) versus SNR. The mutual information for the perfect CSI case is also plotted as an overall benchmark. In Fig. 5, we observe that GAMP-Laplace offers remarkable throughput gains over the entire SNR range for $K=64$ . In addition, although both GAMP-GM and the ML method perform nearly the same as GMAP-Laplace for very high SNR thresholds (when $K=128$ ), the latter is still quite advantageous for low-to-moderate SNRs. For instance, it offers almost 15 bps/Hz throughput gain at $\textrm{SNR}=10$ dB as seen from Fig. 5.

VI Conclusion

In this paper, we propose a new channel estimator for massive MIMO mmWave systems that leverages the inherent sparsity of the channel in the angular domain. The proposed estimator belongs to the family of Bayesian estimators and builds upon the generalized approximate message-passing algorithm. This paper differs from most of the prior literature in that the angular-domain channel coefficients are modeled by a Laplacian prior distribution. We also propose a simple approach based on the expectation-maximization principle to systematically learn the unknown scale parameter of the underlying Laplace distribution along with the unknown noise variance. It is shown that a Laplacian prior leads to substantial performance improvements both in terms of channel estimation accuracy and achievable rate over the Gaussian mixture prior that has been advocated in the recent literature. Moreover, the Laplacian prior speeds up the convergence of GAMP thereby leading to significant computational savings in practice. As possible future directions, it will be interesting to investigate the correlation between the angular-domain channel coefficients and to incorporate the correlation into the estimation process and further to generalize the proposed algorithm to the multi-user case and to the case with finite-resolution analog-to-digital converters.

Appendix A

Using the Laplacian distribution, $p_{\mathcal{X}}(x_{n};b)$ , given in (20) and the fact that $|x|=\operatorname{sgn}(x)\,x,\leavevmode\nobreak\ \forall\,x\in\mathbb{R}$ , it follows that:

[TABLE]

where

[TABLE]

In order to complete the square inside the brackets, we add and subtract $\left[\widehat{r}_{n}-\operatorname{sgn}(x_{n})\mu^{r}_{n}/b\right]^{2}$ thereby leading to:

[TABLE]

Then, by recalling the $(\widehat{r}_{n},\mu^{r}_{n},b)-$ dependent functions $\alpha_{n}(x)$ and $\gamma_{n}(x)$ defined, respectively, in (33) and (34), it follows that:

[TABLE]

Finally, plugging (89) back into (88) yields:

[TABLE]

which is equivalent to the result given claimed in (32).

Appendix B

We have

[TABLE]

By splitting the above integral into the positive and negative parts, it can be shown that:

[TABLE]

in which the function $\Phi_{2}(\gamma,\mu)$ is defined as follows:

[TABLE]

Now, define:

[TABLE]

Using integration by parts, it follows that:

[TABLE]

Since $\big{[}f(t)g(t)\big{]}_{0}^{+\infty}=0$ , we have:

[TABLE]

Then, using the substitution $x=(t-\gamma)/\sqrt{\mu}$ in the right-hand side of (96) and expanding its left-hand side leads to:

[TABLE]

Finally, after rerranging the terms in (Appendix B), it follows that:

[TABLE]

By recalling (48) and using the identity in (50), it can be shown that:

[TABLE]

Now, plugging (99) and (100) back in (91) and using the fact that:

[TABLE]

yields the result claimed in (55).

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Bellili, F. Sohrabi, and W. Yu, “Massive MIMO mm Wave channel estimation using approximate message passing and Laplacian prior,” in Proc. IEEE Workshop Signal Process. Adv. Wireless Commun. (SPAWC) , Kalamata, Greece, June 2018, pp. 1–5.
2[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, and J. C. Zhang, “What will 5G be?” IEEE J. Sel. Areas Commun. , vol. 32, no. 6, pp. 1065–1082, June 2014.
3[3] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access , vol. 1, pp. 335–349, 2013.
4[4] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, “Five disruptive technology directions for 5G,” IEEE Commun. Mag. , vol. 52, no. 2, pp. 74–80, Feb. 2014.
5[5] A. L. Swindlehurst, E. Ayanoglu, P. Heydari, and F. Capolino, “Millimeter-wave massive MIMO: The next wireless revolution?” IEEE Commun. Mag. , vol. 52, no. 9, pp. 56–62, Sept. 2014.
6[6] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 742–758, Oct. 2014.
7[7] T. S. Rappaport, G. R. Mac Cartney, M. K. Samimi, and S. Sun, “Wideband millimeter-wave propagation measurements and channel models for future wireless communication system design,” IEEE Trans. Commun. , vol. 63, no. 9, pp. 3029–3056, Sept. 2015.
8[8] T. S. Rappaport, F. Gutierrez, E. Ben-Dor, J. N. Murdock, Y. Qiao, and J. I. Tamir, “Broadband millimeter-wave propagation measurements and models using adaptive-beam antennas for outdoor urban cellular communications,” IEEE Trans. Antennas Propag. , vol. 61, no. 4, pp. 1850–1859, Apr. 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation with Laplacian Prior

Abstract

Index Terms:

I Introduction

II System Model

III Proposed mmWave channel estimation algorithm

III-A Problem formulation

III-B Modeling the Angular-Domain Coefficients of mmWave Channels:

III-C *MmWave Channel Estimation using GAMP with Laplacian Prior *

IV Learning the Scale Parameter of the Prior Laplacian Distribution and the Noise Variance

V Simulation Results

VI Conclusion

Appendix A

Appendix B

III-C MmWave Channel Estimation using GAMP with Laplacian Prior