A Block Sparsity Based Estimator for mmWave Massive MIMO Channels with   Beam Squint

Mingjin Wang; Feifei Gao; Mark F. Flanagan; Nir Shlezinger; and Yonina; C. Eldar

arXiv:1904.12272·eess.SP·February 19, 2020

A Block Sparsity Based Estimator for mmWave Massive MIMO Channels with Beam Squint

Mingjin Wang, Feifei Gao, Mark F. Flanagan, Nir Shlezinger, and Yonina, C. Eldar

PDF

TL;DR

This paper introduces a novel block sparsity-based channel estimator for mmWave massive MIMO systems that accounts for beam squint effects, improving channel estimation accuracy by exploiting angle-delay sparsity and reciprocity.

Contribution

It proposes a new compressive sensing algorithm that models beam squint effects with block sparsity, enabling more accurate off-grid angle and delay estimation in mmWave MIMO channels.

Findings

01

Enhanced channel estimation accuracy over traditional methods.

02

Effective joint off-grid angle and delay estimation.

03

Applicability to both uplink and downlink in FDD systems.

Abstract

Multiple-input multiple-output (MIMO) millimeter wave (mmWave) communication is a key technology for next generation wireless networks. One of the consequences of utilizing a large number of antennas with an increased bandwidth is that array steering vectors vary among different subcarriers. Due to this effect, known as beam squint, the conventional channel model is no longer applicable for mmWave massive MIMO systems. In this paper, we study channel estimation under the resulting non-standard model. To that aim, we first analyze the beam squint effect from an array signal processing perspective, resulting in a model which sheds light on the angle-delay sparsity of mmWave transmission. We next design a compressive sensing based channel estimation algorithm which utilizes the shift-invariant block-sparsity of this channel model. The proposed algorithm jointly computes the off-grid…

Equations159

\overset{s}{ˉ} (t) = i = - \infty \sum + \infty α [i] g (t - i T_{s}),

\overset{s}{ˉ} (t) = i = - \infty \sum + \infty α [i] g (t - i T_{s}),

\tilde{s} (t) = R {\overset{s}{ˉ} (t) e^{j 2 π f_{c}^{ul} t}} .

\tilde{s} (t) = R {\overset{s}{ˉ} (t) e^{j 2 π f_{c}^{ul} t}} .

Δ_{p, m}^{τ} = \frac{m d \cdot sin θ _{p}^{ul}}{c} = \frac{m d \cdot sin θ _{p}^{ul}}{λ _{c}^{ul} f _{c}^{ul}} .

Δ_{p, m}^{τ} = \frac{m d \cdot sin θ _{p}^{ul}}{c} = \frac{m d \cdot sin θ _{p}^{ul}}{λ _{c}^{ul} f _{c}^{ul}} .

\tilde{y}_{m}^{ul} (t) = p = 1 \sum P β_{p}^{ul} \tilde{s} (t - τ_{p}^{ul} - Δ_{p, m}^{τ})

\tilde{y}_{m}^{ul} (t) = p = 1 \sum P β_{p}^{ul} \tilde{s} (t - τ_{p}^{ul} - Δ_{p, m}^{τ})

= p = 1 \sum P R {β_{p}^{ul} \overset{s}{ˉ} (t - τ_{p}^{ul} - Δ_{p, m}^{τ}) e^{j 2 π f_{c}^{ul} (t - τ_{p}^{ul} - Δ_{p, m}^{τ})}} .

\overset{y}{ˉ}_{m}^{ul} (t) = p = 1 \sum P β_{p}^{ul} \overset{s}{ˉ} (t - τ_{p}^{ul} - Δ_{p, m}^{τ}) e^{j 2 π f_{c}^{ul} (- τ_{p}^{ul} - Δ_{p, m}^{τ})}

\overset{y}{ˉ}_{m}^{ul} (t) = p = 1 \sum P β_{p}^{ul} \overset{s}{ˉ} (t - τ_{p}^{ul} - Δ_{p, m}^{τ}) e^{j 2 π f_{c}^{ul} (- τ_{p}^{ul} - Δ_{p, m}^{τ})}

\displaystyle=\bigg{(}\sum_{p=1}^{P}\bar{\beta}_{p}^{\text{ul}}e^{-j2\pi f_{c}^{\text{ul}}\Delta^{\tau}_{p,m}}\delta(t-\tau_{p}^{\text{ul}}-\Delta^{\tau}_{p,m})\bigg{)}*\bar{s}(t),

y_{m}^{ul} (f) = \int_{- \infty}^{+ \infty} \overset{y}{ˉ}_{m}^{ul} (t) e^{- j 2 π f t} d t

y_{m}^{ul} (f) = \int_{- \infty}^{+ \infty} \overset{y}{ˉ}_{m}^{ul} (t) e^{- j 2 π f t} d t

\displaystyle=\bigg{(}\sum_{p=1}^{P}\bar{\beta}_{p}^{\text{ul}}e^{-j2\pi f_{c}^{\text{ul}}\Delta^{\tau}_{p,m}}e^{-j2\pi f(\tau_{p}^{\text{ul}}+\Delta^{\tau}_{p,m})}\bigg{)}{s}(f)

\displaystyle=\bigg{(}\sum_{p=1}^{P}\bar{\beta}_{p}^{\text{ul}}e^{-j(m-1)\phi_{p}^{\text{ul}}}e^{-j2\pi f\Delta^{\tau}_{p,m}}e^{-j2\pi f\tau_{p}^{\text{ul}}}\bigg{)}{s}(f),

y^{ul} (f) = [y_{1}^{ul} (f), y_{2}^{ul} (f), \dots, y_{M}^{ul} (f)]^{T} = h^{ul} (f) s (f),

y^{ul} (f) = [y_{1}^{ul} (f), y_{2}^{ul} (f), \dots, y_{M}^{ul} (f)]^{T} = h^{ul} (f) s (f),

h^{ul} (f) = p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} a (ϕ_{p}^{ul}, f) e^{- j 2 π f τ_{p}^{ul}},

h^{ul} (f) = p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} a (ϕ_{p}^{ul}, f) e^{- j 2 π f τ_{p}^{ul}},

a (ϕ_{p}^{ul}, f)

a (ϕ_{p}^{ul}, f)

= [1, \dots, e^{- j (M - 1) (1 + \frac{f}{f _{c}^{ul}}) ϕ_{p}^{ul}}]^{T},

h^{ul} (n f_{0}) = p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} a (ϕ_{p}^{ul}, n f_{0}) e^{- j 2 π n f_{0} τ_{p}^{ul}}

h^{ul} (n f_{0}) = p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} a (ϕ_{p}^{ul}, n f_{0}) e^{- j 2 π n f_{0} τ_{p}^{ul}}

= p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} [a (ϕ_{p}^{ul}, 0) e^{- j 2 π n f_{0} τ_{p}^{ul}}] ⊙ [W (ϕ_{p}^{ul})]_{:, n},

[\bm{W}(\phi_{p}^{\text{ul}})]_{m,n}\triangleq\text{exp}\Big{(}-jm\frac{nf_{0}}{f_{c}^{\text{ul}}}\phi_{p}^{\text{ul}}\Big{)}.

[\bm{W}(\phi_{p}^{\text{ul}})]_{m,n}\triangleq\text{exp}\Big{(}-jm\frac{nf_{0}}{f_{c}^{\text{ul}}}\phi_{p}^{\text{ul}}\Big{)}.

H^{ul} = [h^{ul} (0), h^{ul} (f_{0}), \dots, h^{ul} ((N - 1) f_{0})] .

H^{ul} = [h^{ul} (0), h^{ul} (f_{0}), \dots, h^{ul} ((N - 1) f_{0})] .

H^{ul}

H^{ul}

= p = 1 \sum P \overset{ˉ}{β}_{p}^{ul} Ξ (ϕ_{p}^{ul}, τ_{p}^{ul}),

b (τ_{p}^{ul}) ≜ [1, e^{- j 2 π f_{0} τ_{p}^{ul}}, \dots, e^{- j 2 π (N - 1) f_{0} τ_{p}^{ul}}]^{T},

b (τ_{p}^{ul}) ≜ [1, e^{- j 2 π f_{0} τ_{p}^{ul}}, \dots, e^{- j 2 π (N - 1) f_{0} τ_{p}^{ul}}]^{T},

M, N \to \infty lim \frac{1}{M N} vec (Ξ (ϕ_{p}^{ul}, τ_{p}^{ul})^{H} vec (Ξ (ϕ_{s}^{ul}, τ_{s}^{ul}))

M, N \to \infty lim \frac{1}{M N} vec (Ξ (ϕ_{p}^{ul}, τ_{p}^{ul})^{H} vec (Ξ (ϕ_{s}^{ul}, τ_{s}^{ul}))

\displaystyle=\bigg{\{}\begin{array}[]{ll}1,&\ \textup{if}\ (\phi_{p}^{\text{ul}},\tau_{p}^{\text{ul}})=(\phi_{s}^{\text{ul}},\tau_{s}^{\text{ul}})\\ 0,&\ \text{otherwise.}\end{array}

N_{1} ⋃ \dots ⋃ N_{K} = {0, 1, \dots, N - 1} .

N_{1} ⋃ \dots ⋃ N_{K} = {0, 1, \dots, N - 1} .

Y_{N_{k}}

Y_{N_{k}}

[H_{N}^{ul}]_{:, q}

[H_{N}^{ul}]_{:, q}

= p = 1 \sum P \overset{z}{ˉ}_{q, p}^{ul} a (ϕ_{p}^{ul}, n_{q} f_{0}),

A (ϑ^{ul}, n_{q} f_{0}) ≜ [a (ϑ_{1}^{ul}, n_{q} f_{0}), \dots, a (ϑ_{L}^{ul}, n_{q} f_{0})],

A (ϑ^{ul}, n_{q} f_{0}) ≜ [a (ϑ_{1}^{ul}, n_{q} f_{0}), \dots, a (ϑ_{L}^{ul}, n_{q} f_{0})],

[H_{N}^{ul}]_{:, q} \approx A (ϑ^{ul}, n_{q} f_{0}) z_{q},

[H_{N}^{ul}]_{:, q} \approx A (ϑ^{ul}, n_{q} f_{0}) z_{q},

\displaystyle z_{q,i}\!=\!\bigg{\{}\begin{array}[]{ll}\bar{z}_{q,p}^{\text{ul}},&i\!=\!\operatorname*{argmin}\limits_{k\in\{1,\ldots,L\}}|\vartheta^{\text{ul}}_{k}\!-\!\phi^{\text{ul}}_{p}|,\hskip 5.69046ptp={1,\ldots,{P}}\\ 0,&\ \text{otherwise.}\end{array}

\displaystyle z_{q,i}\!=\!\bigg{\{}\begin{array}[]{ll}\bar{z}_{q,p}^{\text{ul}},&i\!=\!\operatorname*{argmin}\limits_{k\in\{1,\ldots,L\}}|\vartheta^{\text{ul}}_{k}\!-\!\phi^{\text{ul}}_{p}|,\hskip 5.69046ptp={1,\ldots,{P}}\\ 0,&\ \text{otherwise.}\end{array}

H_{N}^{ul}

H_{N}^{ul}

H_{N}^{ul}

H_{N}^{ul}

= A (ϑ^{ul}, 0) [z_{1}, z_{2}, \dots, z_{T}] = A (ϑ^{ul}, 0) Z,

vec ((H_{N}^{ul})^{T}) \approx (A (ϑ^{ul}, 0) \otimes I_{T}) vec (Z^{T}) .

vec ((H_{N}^{ul})^{T}) \approx (A (ϑ^{ul}, 0) \otimes I_{T}) vec (Z^{T}) .

D_{bs} (ϑ^{ul})

D_{bs} (ϑ^{ul})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Block Sparsity Based Estimator for mmWave Massive MIMO Channels with Beam Squint

Mingjin Wang, Feifei Gao, Mark F. Flanagan, Nir Shlezinger, and Yonina C. Eldar M. Wang and F. Gao are with the Institute for Artificial Intelligence, Tsinghua University (THUAI), State Key Lab of Intelligent Technologies and Systems, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, P. R. China (email: [email protected], [email protected]).M. F. Flanagan is with the School of Electrical and Electronic Engineering University College Dublin, Belfield, Dublin 4, Ireland (email: [email protected]).N. Shlezinger and Y. C. Eldar are with the Faculty of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot, Israel (e-mail: [email protected]; [email protected]).

Abstract

Multiple-input multiple-output (MIMO) millimeter wave (mmWave) communication is a key technology for next generation wireless networks. One of the consequences of utilizing a large number of antennas with an increased bandwidth is that array steering vectors vary among different subcarriers. Due to this effect, known as beam squint, the conventional channel model is no longer applicable for mmWave massive MIMO systems. In this paper, we study channel estimation under the resulting non-standard model. To that aim, we first analyze the beam squint effect from an array signal processing perspective, resulting in a model which sheds light on the angle-delay sparsity of mmWave transmission. We next design a compressive sensing based channel estimation algorithm which utilizes the shift-invariant block-sparsity of this channel model. The proposed algorithm jointly computes the off-grid angles, the off-grid delays, and the complex gains of the multi-path channel. We show that the newly proposed scheme reflects the mmWave channel more accurately and results in improved performance compared to traditional approaches. We then demonstrate how this approach can be applied to recover both the uplink as well as the downlink channel in frequency division duplex (FDD) systems, by exploiting the angle-delay reciprocity of mmWave channels.

Index Terms:

mmWave, massive MIMO, channel estimation, beam squint, block sparsity, angle-delay reciprocity.

I Introduction

The proliferation of wireless services such as multimedia, virtual reality, network video, vehicular networking, and the Internet of Things gives rise to continuously increasing demands on the transmission rate and quality of service. These demands include higher throughput, shorter delays, improved connectivity, denser networks, and better user experience [1]. To meet all these requirements, it is necessary to exploit higher frequencies, and in particular, the millimeter wave (mmWave) band, to overcome the spectral congestion of standard wireless frequency bands [2]. An additional method to increase the spectral efficiency and to improve spatial resolution is to equip the base station (BS) with a large-scale antenna array [3, 4, 5]. This technique is commonly referred to as massive MIMO. Due to the short wavelength of mmWave, utilizing large antenna arrays is also essential for successfully implementing mmWave communications. In particular, the increased number of antennas can be used to implement directed beamforming, thus overcoming the dominant path-loss induced at mmWave with no line-of-sight [6, 7].

A large body of research has been devoted to understanding the potential and challenges associated with mmWave massive MIMO communications in recent years. Rappaport et al. proposed a model for mmWave channels based on an extensive measurement campaign and demonstrated that the mmWave band can effectively support high-speed data transmission [8]. The work [9] derived capacity bounds for mmWave massive MIMO communications based on the model of [8]. Additionally, [10] outlined the benefits, challenges, and potential solutions associated with cellular networks utilizing mmWave massive MIMO technology.

In order to achieve the potential benefits of mmWave massive MIMO communications, it is critical to have accurate channel state information (CSI). However, the channel characteristics in mmWave bands are quite different from their conventional sub-6 GHz counterparts. In particular:

experimental studies [12, 11] have shown that electromagnetic waves in mmWave bands suffer from severe path loss and have difficulty in bypassing obstacles;
mmWave channels have been shown to exhibit sparsity in the angle and delay domains, which is not encountered in microwave frequencies [13, 14]. Furthermore, due to the narrow angle spread of each cluster, the channel covariance matrices (CCMs) of the resulting channel models are typically low-rank. These properties indicate that any efficient channel estimator should be able to build upon this inherent structure.

Different low-complexity estimation algorithms have been designed to exploit the sparsity or low-rank property of the channel, including CCM based approaches [15], compress sensing (CS) algorithms [16, 17], and angle domain based methods [18]. In particular, under the assumption of a finite scattering environment, the work [15] mathematically demonstrated the low-rank feature of the CCMs in mmWave communications and proposed a joint spatial division multiplexing algorithm to reduce the effective dimensions of the channel. The work [16] used the low-rank structure of the CCMs to cast the channel estimation task into a quadratic semidefinite programming (SDP) problem, which was solved using a polynomial SDP method. With a given unitary dictionary matrix known to the BS, [17] represented a virtual channel which has a common sparsity due to the fact that the users share the same local scatters. A joint orthogonal matching pursuit (OMP) recovery algorithm was then presented in [17] to estimate the channel and reduce feedback. To exploit the angle information for sparse channel estimation, [18] designed a fast discrete Fourier transform (DFT) based spatial rotation algorithm to concentrate most of the channel power on limited DFT grids and efficiently obtain the angle information for both frequency division duplex (FDD) as well as time division duplex (TDD) systems. In particular, [18] used an array signal processing aided channel estimation scheme, where the angle information of the user is exploited to simplify channel estimation. A detailed overview on signal processing methods, including array signal processing techniques, for mmWave massive MIMO communications can be found in [12].

An important drawback of the sparse channel estimation approaches mentioned above stems from the fact that they use on-grid estimation [19] to solve the optimization problem, namely, they divide the continuous parameter space into a finite set of grid points. The sparse channel is then estimated assuming a discrete dictionary, resulting in increased estimation errors as the exact parameter does not necessarily lie on the discrete grid. Such grid mismatch introduces quantization error in addition to the channel recovery uncertainty, which may reduce the ability to accurately estimate the channel.

Another drawback of previously proposed channel estimators, e.g., [15, 16, 17, 18], is that the massive MIMO model used is directly obtained from the conventional MIMO model. This model only observes the phase differences but does not capture the propagation delay of the same incident signal observed at different antennas. This effect is negligible in conventional MIMO setups with a relatively small number of antennas, however, it cannot be ignored when the antenna array grows larger and the bandwidth increases. This phenomenon is referred to as the spatial-wideband effect [20]. For orthogonal frequency division multiplexing (OFDM) systems, this effect makes the array response vary with frequency, causing the beams observed by the receiver to “deviate” as a function of frequency [21], which is also known as the beam squint effect (BSE) [22]. Since mmWave communications highly rely on the precise alignment of beams between the transmitters and the receivers, the BSE may result in severe performance degradation if not carefully treated.

The BSE was experimentally evaluated in [23], which measured the beam squinting range of 15 degrees with a 4-element patch array over 6GHz bandwidth with central frequency of 60GHz. The works [24, 25] proposed to mitigate the BSE by integrating interconnected slow-wave structures and metamaterial cells along the array feed network. In [26], the authors designed a beamforming codebook to compensate for the BSE by imposing an achievable rate constraint. Nevertheless, these efforts [24, 25, 26] aim only to compensate for the channel performance loss caused by the BSE, and do not provide a systematic channel estimation scheme under this effect.

In this paper, we develop a set of channel estimation algorithms for mmWave massive MIMO systems, for both the uplink channel as well as the downlink channel, accounting for the BSE. Our proposed algorithms operate in an off-grid manner, namely, the estimated channel coefficients can take values in a continuous set. To that aim, we first decompose the channel vectors into angle, delay, and gain parameters, and propose a model for the BSE using these parameters. Next, we demonstrate that the task of estimating angle and delay parameters can each be expressed as a block-sparse signal recovery problem using the matrix representation of the pilot sequence. We then propose a pilot-based block-iterative gradient descent algorithm to estimate the angle and delay parameters for the uplink channel. Based on this estimation method, we derive an efficient angle and delay pairing algorithm with low computing complexity. By exploiting the frequency insensitivity of angle and delay, i.e., angle-delay reciprocity, we extend the proposed approach and develop an effective algorithm for downlink channel estimation in FDD systems, thus tackling one of the major problems noted in the massive MIMO literature [27].

The rest of this paper is organized as follows: Section II introduces the BSE in the time and angle domains, and formulates the mmWave massive MIMO-OFDM system model. Section III illustrates the block sparsity based uplink angle/delay estimation algorithm which accounts for the BSE and shows how it can be used to reconstruct the uplink channel. Section IV designs a downlink channel estimation scheme for FDD massive MIMO systems with low complexity and low overhead based on the guidelines used for deriving the uplink channel estimator. Numerical results are provided in Section V, and Section VI concludes this paper.

Notation: Throughout this paper, vectors and matrices are denoted by boldface lower-case and upper-case letters, respectively; transpose, conjugate, Hermitian, inverse, and pseudo-inverse of the matrix ${\mathbf{A}}$ are denoted by ${\mathbf{A}}^{T}$ , ${\mathbf{A}}^{\ast}$ , ${\mathbf{A}}^{H}$ , ${\mathbf{A}}^{-1}$ and ${\mathbf{A}}^{\dagger}$ , respectively; $\left\|{\mathbf{A}}\right\|_{F}$ denotes the Frobenius norm of the matrix $\mathbf{A}$ ; $[{\mathbf{A}}]_{i,j}$ is the $(i,j)$ th entry of ${\mathbf{A}}$ ; indices of vectors and matrices start at 0; $[{\mathbf{A}}]_{i,:}$ represents the $i$ th row of the matrix ${\mathbf{A}}$ ; $[{\mathbf{A}}]_{:,j}$ represents the $j$ th columns of the matrix ${\mathbf{A}}$ ; $\textup{vec}({\mathbf{A}})$ represents column-major vectorization of the matrix ${\mathbf{A}}$ , i.e., the operation of stacking the columns of matrix ${\mathbf{A}}$ to form a vector; $\left\|\bm{h}\right\|_{2}$ denotes the Euclidean norm of the vector $\bm{h}$ ; $\otimes$ denotes the Kronecker product, and $\odot$ is the Hadamard product of matrices; ${\mathbf{I}}_{N}$ is the $N\times N$ identity matrix; $\mathbb{R}$ and $\mathbb{C}$ represent the sets of real and complex numbers, respectively; $\mathcal{R}(\cdot)$ is the real part of a complex number, vector or matrix; $\lfloor a\rfloor$ is the downward rounding operation of a real number $a$ .

II Channel Model

II-A mmWave Massive MIMO Uplink Channel Model

We consider a mmWave massive MIMO system, focusing on the uplink transmission. The BS is equipped with a uniform linear array (ULA) consisting of $M$ antennas, where $M$ is a large integer, and the antenna spacing is $d=\lambda_{c}^{\text{ul}}/2$ . Here, $\lambda_{c}^{\text{ul}}\triangleq{c}/{f_{c}^{\text{ul}}}$ , where $f_{c}^{\text{ul}}$ is the uplink center frequency, $\lambda_{c}^{\text{ul}}$ is the wavelength, and $c$ is the speed of light. To present the channel model, highlighting the spatial wideband effect, we discuss the case of a single user with a single antenna and assume a noiseless setup in this subsection. The proposed model can be extended to multiple users with multiple antennas by properly adapting the arguments in the sequel.

To formulate the channel model, we use $\alpha[i]$ to denote the discrete-time baseband transmitted symbol, with symbol period $T_{s}$ . The continuous-time baseband transmit signal $\bar{s}(t)$ can thus be expressed as

[TABLE]

where $g(t)$ is the pulse shaping function. After modulating, the passband transmit signal $\tilde{s}(t)$ can be written as

[TABLE]

Let $P$ denote the number of paths between the user and the BS. Each path has direction of arrival (DOA) $\theta_{p}^{\text{ul}}\in[-\pi/2,\pi/2)$ and passband gain $\beta_{p}^{\text{ul}}\in\mathbb{R}^{+}$ . Denote the $p$ th path delay between the transmitter and the first receive antenna by $\tau_{p}^{\text{ul}}$ . Unlike conventional MIMO models, the large array aperture of massive MIMO receivers results in non-negligible delays among different antennas. Those delays are present even for received signal components corresponding to the same channel path, as illustrated in Fig. 1. The extra delay of the $p$ th path from the $m$ th receive antenna compared to the first receive antenna is given by

[TABLE]

Consequently, the passband receive signal at the $m$ th antenna can be written as

[TABLE]

For conventional MIMO systems, where $M$ is small or the bandwidth is narrow, it typically holds that $\Delta^{\tau}_{p,m}\ll T_{s}$ for each $m\in\{0,1,\ldots,M-1\}$ . Thus the signals at different antennas satisfy $\bar{s}(t-\tau_{p}^{\text{ul}}-\Delta^{\tau}_{p,m})\approx\bar{s}(t-\tau_{p}^{\text{ul}})$ . Namely, different antennas at the BS effectively observe a synchronized signal. In such scenarios, the standard MIMO channel output model, i.e., a linear convolution between the individual channel impulse responses and the same source signal, faithfully represents the received signal.

However, for mmWave massive MIMO systems, $\Delta^{\tau}_{p,m}$ cannot be ignored for larger values of $m$ , and the approximation $\bar{s}(t-\tau_{p}^{\text{ul}}-\Delta^{\tau}_{p,m})\approx\bar{s}(t-\tau_{p}^{\text{ul}})$ no longer holds. In this case, the signal observed by the first antenna will include a different time shift compared to the signals observed by other antennas. This phenomenon is referred to as the spatial wideband effect [20]. As an illustrative example, consider a massive MIMO system with $M=128$ BS antennas, $\theta^{\text{ul}}_{p}=60^{\circ}$ , baseband symbol rate $f_{s}=\frac{1}{T_{s}}=2$ GHz, and mmWave carrier frequency $f_{c}^{\text{ul}}=60$ GHz. Under this setting, the signal delay between the first antenna and the last one, computed via (3), is $1.85T_{s}$ , which is clearly non-negligible.

In the presence of the spatial wideband effect, it is difficult to formulate a unified discrete-time MIMO channel model since the signals arrive at different antennas with relative delays which are fractions of the symbol period. To derive a convenient model which facilitates analysis, we observe the transmission in an antenna-by-antenna manner. From (II-A), removing the center frequency $f_{c}^{\text{ul}}$ , to switch from passband to baseband representation, the continuous-time baseband receive signal at the $m$ th antenna is given by

[TABLE]

where $\bar{\beta}_{p}^{\text{ul}}\triangleq\beta_{p}^{\text{ul}}e^{-j2\pi f_{c}^{\text{ul}}\tau_{p}^{\text{ul}}}$ is the equivalent complex channel gain, and $*$ denotes the convolution operator. Taking the Fourier transform of (II-A), we obtain the frequency-domain representation of the received signal at the $m$ th antenna as:

[TABLE]

where $\phi_{p}^{\text{ul}}\triangleq\frac{2\pi d\cdot\sin(\theta_{p}^{\text{ul}})}{\lambda_{c}^{\text{ul}}}\in[-\pi,\pi)$ is defined as the normalized DOA. For clarity, in the rest of the paper, the term “DOA” will refer to the normalized DOA.

By stacking the received signal $y_{m}^{\text{ul}}(f)$ over all $M$ antennas into a single vector representation, we can write

[TABLE]

where

[TABLE]

is the uplink channel frequency response, and

[TABLE]

is the wideband array steering vector. Note that unlike conventional array steering vectors, see, e.g., [16, 17, 18], the wideband array steering vector $\bm{a}(\phi_{p}^{\text{ul}},f)$ is frequency-dependent.

II-B Beam Squint Effect in OFDM Systems

We next focus on OFDM signaling with $N$ subcarriers and subcarrier spacing $f_{0}$ . We henceforth assume that the number of antennas $M$ satisfies $M\leq 2N\frac{f_{c}^{\rm up}}{f_{s}}$ . This assumption is reasonable for mmWave systems in which the bandwidth is very large, thus the number of subcarriers $N$ is of the same order as $M$ , and the carrier frequency $f_{c}^{\rm up}$ is much larger than the symbol rate $f_{s}$ .

Using (8), the uplink channel vector corresponding to the $n$ th subcarrier can be written as

[TABLE]

for each $n=\{0,1,\ldots,N-1\}$ , where $\bm{W}(\phi_{p}^{\text{ul}})$ is the $M\times N$ matrix of wideband factors whose $(m,n)$ th entry is given by

[TABLE]

The overall $M\times N$ uplink channel matrix for the OFDM system can then be written as

[TABLE]

In previous works, e.g., [16, 17, 18], which do not account for the spatial-wideband effect, the factor $\Delta^{\tau}_{p,m}$ in (II-A) is assumed to be zero. In this case, the array steering vector of any subcarrier reduces to $\bm{a}(\phi_{p}^{\text{ul}},0)$ , namely, it is frequency-independent. When the frequency dependence of the array steering vector is not accounted for, the postulated beam direction at the $n$ th subcarrier is assumed to be $\bm{a}(\phi_{p}^{\text{ul}},0)$ , which has a phase offset of $nf_{0}$ compared to the true beam direction $\bm{a}(\phi_{p}^{\text{ul}},nf_{0})$ . This causes a deviation between the spatially oriented beam and the user’s true direction, commonly referred to as beam squint. This deviation can lead to significant performance degradation. Fig. 2 depcits an example of beam squint where the beams at different subcarriers point in different directions relative to the same user angle.

II-C Channel Characteristics due to the BSE

We now discuss the unique channel characteristics which arise in the presence of the BSE, focusing on OFDM signaling. It follows from (II-B) that $\bm{H}^{\text{ul}}$ can be modeled as the sum of the contributions from $P$ paths via

[TABLE]

where $\bm{b}(\tau_{p}^{\text{ul}})$ is an ${N\times 1}$ vector defined as

[TABLE]

and $\bm{\Xi}(\phi_{p}^{\text{ul}},\tau_{p}^{\text{ul}})\triangleq\left[\mathbf{a}(\phi_{p}^{\text{ul}},0)\mathbf{b}^{T}(\tau_{p}^{\text{ul}})\right]\odot\mathbf{W}(\phi_{p}^{\text{ul}})\in\mathbb{C}^{M\times N}$ .

Using these notations, we state the following asymptotic channel characteristic taken from [20], which holds when the number of antennas and the number of the subcarriers both grow arbitrarily large.

Theorem 1

If the conditions $\frac{d\cdot f_{s}}{\lambda_{c}^{\text{up}}\cdot f^{\text{up}}_{c}}<1$ and $\frac{M-1}{2N}\frac{f_{s}}{f^{\text{up}}_{c}}<1$ are both satisfied, then, as $M\rightarrow\infty$ and $N\rightarrow\infty$ , the following property holds [20]

[TABLE]

The first condition in Theorem 1 commonly holds, since the antenna spacing $d$ is typically half of the wavelength $\lambda$ , and the symbol rate $f_{s}$ , which is typically determined by the signal bandwidth, is far less than its carrier frequency in mmWave massive MIMO systems. The second condition holds under our system stated in the previous subsection.

Theorem 1 implies that in the massive MIMO regime with a sufficiently large number of subcarriers, paths with different angles or delays can be distinguished easily. We will exploit this property in Section III where we consider the reconstruction of the uplink channel.

For mmWave communications, the number of significant paths is typically much smaller compared to that encountered in standard sub 6-GHz systems [13, 14], and thus $\bm{H}^{\text{ul}}$ can be represented by a few steering vectors in the spatial and delay domains. This sparse property indicates that CS methods can be utilized to efficiently estimate the channel parameters, as we show in the following section.

III Uplink Channel Estimation

In this section we propose algorithms for estimating uplink mmWave massive MIMO channels, accounting for the BSE. In particular, we assume that the BS serves $K$ users using a MIMO-OFDM protocol [28], in which channel estimation is carried out using dedicated pilots in an FDD manner. Non-overlapping subcarriers are assigned to different users. To model the observed signal used for channel estimation, we let $\mathcal{N}_{k}$ denote the set of subcarrier frequencies utilized by the $k$ th user, and use $T\triangleq N/K$ to denote its cardinality111Here, we only use one OFDM block to estimate the uplink channel parameters, and $T$ is an integer not larger than the channel coherence time. When $N$ is not an integer multiple of $K$ , we use $T=\lfloor\frac{N}{K}\rfloor$ . If $K$ is too large such that the block length $T$ is not long enough to estimate the multiple channels, one can utilize multiple OFDM blocks for channel estimation. . Since all subcarriers are allocated among the users without spectral overlapping, it holds that $\mathcal{N}_{k}\bigcap\mathcal{N}_{l}\!=\varnothing,k\neq l$ , and that

[TABLE]

While in the previous section we focused on a single user, here we consider multiple users. Therefore, we henceforth use the notation $\bm{H}^{\text{ul}}_{k}$ to denote the channel of the $k$ th user, similarly to (12). The a-priori known pilot sequence transmitted by the $k$ th user is denoted by $\bm{s}_{k}^{\text{ul}}\in\mathbb{C}^{T\times 1}$ , and is assumed to have non-zero entries. This assumption accommodates a broad range of pilot sequences used in practice, such as Zadoff-Chu (ZC) sequences [29]. The received pilots from the $k$ th user at the BS, aggregated over the corresponding subcarriers assigned to the $k$ th user, can be expressed as

[TABLE]

where $\bm{H}^{\text{ul}}_{\mathcal{N}_{k}}\in\mathbb{C}^{M\times T}$ is the subset of columns of $\bm{H}^{\text{ul}}_{k}$ with column indices in $\mathcal{N}_{k}$ , $\bm{S}^{\text{ul}}_{k}\triangleq\text{diag}\left\{\bm{s}_{k}^{\text{ul}}\right\}\in\mathbb{C}^{T\times T}$ is the diagonal $k$ th user pilot matrix, and $\bm{E}_{\mathcal{N}_{k}}$ is additive noise with i.i.d. zero-mean unit variance proper complex Gaussian entries.

Our goal is to reconstruct the complete channel of the $k$ th user, $\bm{H}_{k}^{\text{ul}}$ , from the channel output $\bm{Y}_{\mathcal{N}_{k}}$ . To that aim, we first identify the sparse characteristics of the unknown channel in Subsection III-A. Then, in Subsections III-B and III-C we exploit this sparse nature to efficiently estimate the unknown channel DOAs and delays, respectively. Finally, in Subsection III-D we show how these estimates can be combined to recover the unknown channel.

Since the pilot symbols of different users do not overlap in frequency, it follows from (19) that the channel estimation procedure can be carried out individually for each user. Therefore, for clarity, in the rest of Section III and in Section IV, we omit the user index $k$ .

III-A Sparse Representation

To model the sparse nature of the mmWave channel coefficients with the BSE, we let the set of utilized subcarriers be written as $\mathcal{N}=\{n_{1},n_{2},\ldots,n_{T}\}$ . It follows from (II-B) that the $q$ th column of $\bm{H}_{\mathcal{N}}^{\text{ul}}$ , $q\in\{1,2,\cdots,T\}$ , can be written as

[TABLE]

where $\bar{z}_{q,p}^{\text{ul}}=\bar{\beta}_{p}^{\text{ul}}e^{-j2\pi n_{q}f_{0}\tau_{p}^{\text{ul}}}$ .

We assume that the number of possible paths, denoted by $L$ , is a relatively large number, and is much larger than the number of actual paths ( $L\gg{P}$ ). We define

[TABLE]

as an overcomplete sub-dictionary based on (II-A), where $\bm{\vartheta}^{\text{ul}}=[\vartheta_{1}^{\text{ul}},\vartheta_{2}^{\text{ul}},\cdots,\vartheta_{L}^{\text{ul}}]$ and $\vartheta_{i}^{\text{ul}}\triangleq{-\pi}+\frac{2i\pi}{L},i\in\{1,2,\cdots,L\}$ divides the continuous angle space uniformly. Since $L$ is large, the true DOA angles $\bm{\phi}^{\text{ul}}\triangleq[\phi_{1}^{\text{ul}},\cdots,\phi_{P}^{\text{ul}}]$ can be approximated (with some quantization error) to be a subset of $\bm{\vartheta}^{\text{ul}}$ . This indicates that we can use the overcomplete sub-dictionary $\bm{A}(\bm{\vartheta}^{\text{ul}},n_{q}f_{0})$ to represent $[\bm{H}_{\mathcal{N}}^{\text{ul}}]_{:,q}$ as

[TABLE]

where $\bm{z}_{q}$ is a ${L\times 1}$ sparse vector whose $i$ th element is

[TABLE]

Specifically, for each $q\in\{1,2,\ldots,T\}$ , $\bm{z}_{q}$ has at most ${P}$ nonzero values, i.e., it is ${P}$ -sparse. It follows from (22) that $\mathbf{H}_{\mathcal{N}}^{\text{ul}}$ can be approximated as

[TABLE]

We note that when the BSE is not present, the term $\bm{a}(\vartheta_{i}^{\text{ul}},n_{q}f_{0})$ reduces to $\bm{a}(\vartheta_{i}^{\text{ul}},0)$ , which is frequency-independent. In this case, $\bm{H}^{\text{ul}}_{\mathcal{N}}$ can be written as:

[TABLE]

where $\bm{Z}=[\bm{z}_{1},\bm{z}_{2},\cdots,\bm{z}_{T}]$ . From (25) it holds that the sparsity pattern of ${\bm{z}}_{q}$ is independent of $q$ , i.e., all the vectors $\{\bm{z}_{q}\}_{q=1}^{T}$ have their non-zero elements in the same entries, so that the matrix $\bm{Z}$ has at most ${P}$ nonzero rows occuring on a common index set. Consequently, rather than trying to estimate the channel parameters from each subcarrier independently, the parameters can be jointly estimated by combining all the subcarriers, namely, by recasting the estimation of the mmWave channel as a multiple measurement vector (MMV) problem [33]. To exploit the common support structure of $\bm{Z}$ , one can stack the rows of $\bm{Z}$ and $\bm{H}_{\mathcal{N}}^{\text{ul}}$ into vectors. Then, (III-A) can be converted into a sparse recovery problem:

[TABLE]

In the presence of the BSE, $\bm{A}(\bm{\vartheta}^{\text{ul}},n_{q}f_{0})$ varies from subcarrier to subcarrier. In this case, we cannot directly obtain an expression of the form (III-A). Nevertheless, given that $\bm{A}(\bm{\vartheta}^{\text{ul}},n_{q}f_{0})$ has a fixed phase offset ${n_{q}f_{0}}$ compared to $\bm{A}(\bm{\vartheta}^{\text{ul}},0)$ , we can write $\bm{H}_{\mathcal{N}}^{\text{ul}}\approx\bm{A}(\bm{\vartheta}^{\text{ul}},0)\bm{\bar{Z}}$ , where the nonzero columns of $\bm{\bar{Z}}$ have a regular ‘shift’ characteristic. This shift property is illustrated in Fig. 3, in which each square corresponds to a vector entry: black squares represent the nonzero elements while blank squares indicate zeros.

Therefore, to ensure that each column of $\bm{Z}$ still has the same nonzero positions, we propose to design a shift-invariant transform such that the common sparse support of the transformed $\bm{Z}$ satisfies the same sparsity pattern behavior as in the absence of beam squint.

To that aim, we first define the $MT\times LT$ matrix

[TABLE]

where the subscript “bs” stands for beam squint. Here, $\bm{D}({\vartheta_{i}^{\text{ul}}})$ is an $MT\times T$ matrix given by

[TABLE]

and $\bm{\Phi}_{m}(\vartheta_{i}^{\text{ul}})$ is a $T\times T$ frequency rotation matrix with parameter $\vartheta_{i}^{\text{ul}}$ which can be expressed as

[TABLE]

The application of the rotation matrix allows to express $\text{vec}((\bm{H}_{\mathcal{N}}^{\text{ul}})^{T})$ using the matrix $\bm{Z}$ , similarly to (28). Specifically, it can be verified that $\bm{H}_{\mathcal{N}}^{\text{ul}}$ is transformed into an MMV sparse representation given by

[TABLE]

where, as in (22)-(23), the approximation in (28) stems from the fact that the DOAs do not necessarily lie on the grid ${\bm{\vartheta}^{\text{ul}}}$ .

We henceforth refer to $\bm{D}_{\text{bs}}(\bm{\vartheta}^{\text{ul}})$ as the sensing matrix [34], as it represents a linear dimension reduction of $\text{vec}({\bm{Z}}^{T})$ . Since $\bm{Z}$ is ${P}$ -row sparse, $\bm{Z}^{T}$ will be ${P}$ -column sparse. This in turn implies that $\text{vec}({\bm{Z}}^{T})$ is ${P}$ -block sparse [35], which facilitates its recovery using block sparsity methods, as detailed in the following subsection.

III-B Off-grid DOA Estimation Algorithm

We next study the recovery of the DOA vector from the channel output at the observed subcarriers $\bm{Y}_{\mathcal{N}}$ , given in (19). Define ${\bm{x}}\triangleq\text{vec}\big{(}(\bm{Z}\bm{S}^{\text{ul}})^{T}\big{)}$ , recalling that $\bm{S}^{\text{ul}}$ is the pilot matrix with its diagonal elements being the priori known pilot sequence. Since the pilot elements are non-zero, $\bm{x}$ exhibits the same block-sparse structure as $\text{vec}({\bm{Z}}^{T})$ . Also, since the pilot matrix $\bm{S}^{\text{ul}}$ is diagonal, we can formulate the channel output $\bm{Y}_{\mathcal{N}}$ as an MMV sparse representation via

[TABLE]

Previously proposed algorithms for channel estimation in massive MIMO systems [16, 17] assume that the actual DOA values, represented by the entries of the vector $\bm{\phi}^{\text{ul}}$ , coincide with values in the grid vector $\bm{\vartheta}^{\text{ul}}$ . Namely, the DOA values lie on the discrete grid, and there is a one-to-one correspondence between the non-zero indexes of $\bm{x}$ and $\bm{\phi}^{\text{ul}}$ . Then, the DOA vector is recovered from the estimated $\bm{x}$ . We henceforth refer to such methods as on-grid algorithms. However, since $\bm{\phi}^{\text{ul}}$ typically takes values in some continuous non-countable set, the resolution of on-grid DOA estimation is only $(\frac{2\pi}{L})$ , which is also known as grid mismatch. This grid mismatch induces quantization error and degrades the estimation accuracy. Although the resolution of DOA estimation can be improved by increasing $L$ , denser grids implies higher, possibly non-feasible, and computational complexity.

To circumvent the grid mismatch, off-grid solutions have been broadly studied [30, 31, 32]. In off-grid estimation, the estimated DOAs are not restricted to a specific grid and can take any value in the continuous parameter space. The main approaches for off-grid recovery proposed include:

Taylor expansion. In [30], the non-linear dependence in the DOA parameter is linearized via a first order Taylor series expansion, resulting in a formulation from which their values can be recovered without discretization. However, this kind of method heavily depends on the accuracy of the expansion. 2. 2.

Atomic norm denoising. In atomic norm denoising methods, the sparse signals are recovered by solving an atomic norm based objective function [31]. The function then can be converted into a semi-definite program that is solved by off-the-shelf solvers in an off-grid manner. However, solving the atomic norm objective becomes computationally complex for large scale problems, rendering it infeasible for our mmWave massive MIMO channel estimation problem, in which the dimensionality of the multivariate quantities tends to be very large. 3. 3.

Grid refinement. The idea of grid refinement was first introduced by Malioutov et al. [32] to mitigate the effect of grid mismatch in DOA estimation. This approach adaptively refines the grid around candidate spatial locations with a predefined resolution. This approach suffers from two main drawbacks: First, the computational complexity grows proportionally with the desired accuracy; Second, since the resolution of the points is increased iteratively, this approach tends to converge to local optimal points, degrading its performance.

Here, we propose an algorithm for recovering the DOAs which is inspired by grid refinement, while avoiding its drawbacks in terms of recovery performance and computational complexity. Our method starts with a fixed known dense grid, for which (III-B) represents a linear transformation of an unknown block sparse vector $\bm{x}$ . The algorithm then alternates as follows: first, for a fixed grid, it recovers a block-sparse $\bm{x}$ ; then, for fixed block-sparse $\bm{x}$ , we adjust the angle grid and accordingly the projection matrix $\bm{D}_{bs}$ to further minimize the cost. By repeating these steps iteratively, we are able to recover DOA angles which minimize the cost function without necessarily lying on the original grid.

Similarly to grid refinement, our approach changes the grid iteratively. However, while grid refinement modifies the resolution around a set of observed points, our algorithm adjusts the grid values in a continuous manner and reduces the number of grid points iteratively. The benefits of this approach over conventional grid refinement are numerically demonstrated in our simulation study in Section V.

To explain the algorithm in detail, define $L_{\phi}^{(0)}$ as the initial guess of the number of unknown paths that will be gradually decreased and tuned during the estimation procedure. With a slight abuse of notation, we use $\bm{\phi}=[\phi_{1},\cdots,\phi_{L_{\phi}^{(0)}}]$ as the off-grid DOAs to be estimated. Since we do not know the number of paths in advance, $L_{\phi}^{(0)}$ is set initially to a relatively large number, and thus estimating $\bm{\phi}$ can be formulated as a sparse signal recovery problem with an unknown parametric dictionary $\bm{D}_{\text{bs}}(\bm{\phi})$ . In this framework, the objective is not only to estimate the sparse signal, but also to optimize/refine the angle grid such that the parametric dictionary approaches the true sparsifying dictionary.

To proceed, we recall the definition of the block $l_{0}$ -norm:

Definition 1

The $T$ -block $\ell_{o}$ -norm of a $TL\times 1$ vector ${\bm{x}}=[{\bm{x}}^{T}[1],\ldots,{\bm{x}}^{T}[L]]^{T}$ is defined as [33]

[TABLE]

where $\mathcal{I}(\|\bm{x}[i]\|_{2}>0)$ is an indicator function which equals $1$ if $\|\bm{x}[i]\|_{2}>0$ and [math] otherwise, and $\bm{x}[i]$ is the $i$ th block of $\bm{x}$ containing $T$ consecutive elements. Note that the $T$ -block $\ell_{o}$ -norm with $T=1$ reduces to the conventional $\ell_{o}$ -norm.

Using (III-B), we formulate the DOA estimation problem for a fixed known grid $\bm{\phi}$ exploiting the prior knowledge of $\bm{x}$ being block sparse as

[TABLE]

where $\xi$ is an error tolerance parameter that is related to the noise statistics. To recover an off-grid estimate of the DOAs, we first recast the problem (31) as an iterative reweighted least squares objective, as in, e.g., [36]. Then, we use the resulting objective to tune the grid vector $\bm{\phi}$ , supporting an off-grid estimate in a computationally feasible fashion.

To formulate the iterative algorithm, let $\bm{x}^{(\omega)}$ and $\bm{\phi}^{(\omega)}$ be the estimations of $\bm{x}$ and $\bm{\phi}$ at the $\omega$ th iteration, respectively. At each iteration we form the following $\small{L_{\phi}^{(\omega)}T\times{L_{\phi}^{{(\omega)}}}T}$ matrix:

[TABLE]

where $\epsilon>0$ is a positive parameter ensuring that $\bm{G}^{(\omega)}$ is well-defined, and ${L_{\phi}^{(\omega)}}$ is the number of unknown angles at the $\omega$ th iteration. The block-sparsity problem (31) is then recast as a reweighed least squares objective problem:

[TABLE]

The objective in (37) consists of two terms: the weighted norm $\bm{x}^{H}\bm{G}^{(\omega)}\bm{x}$ , which controls the level of block sparsity of the recovered vector $\bm{x}$ , and the term $\|\text{vec}({\bm{Y}}_{{\mathcal{N}}}^{T})-\bm{D}_{\text{bs}}(\bm{\phi}^{(\omega)}){\bm{x}}\|_{2}^{2}$ , which represents the accuracy of the estimation. The balance between the two terms is controlled by the regularization parameter $\lambda^{(\omega)}$ which we set to

[TABLE]

For a given $\bm{\phi}^{(\omega)}=\bm{\phi}$ , the optimal value of $\bm{x}$ in (37) is

[TABLE]

Substituting $\bm{x}^{(\omega+1)}|\bm{\phi}$ back into (37), we can optimize the grid $\bm{\phi}^{(\omega)}$ in light of the objective (37) by minimizing

[TABLE]

Since directly minimizing (III-B) is computationally complex, we propose to gradually decrease the surrogate objective by selecting $\bm{\phi}^{(\omega+1)}$ that satisfies $v(\bm{\phi}^{(\omega+1)})\leq v(\bm{\phi}^{(\omega)})$ for the next iteration. Since $v(\bm{\phi}^{(\omega)})$ is differentiable with respect to $\bm{\phi}$ , the $(\omega\!+\!1)$ th estimation can be obtained by the gradient descent method:

[TABLE]

where $u$ is the step size, and the derivative expression is given in closed-form in the Appendix. The update rule in (41) implies that, even if $\bm{\phi}^{\text{ul}}$ is initialized to a large grid, the iterative algorithm allows the updated DOA estimates to deviate from this initial grid, resulting in off-grid estimation.

In the proposed algorithm, the main complexity lies in calculating $\bm{x}^{(\omega+1)}|\bm{\phi}$ and the first derivative $\frac{\partial v(\bm{\phi}^{(\omega)})}{\partial{\bm{\phi}}}$ . To reduce the computational complexity, a pruning method is introduced: for every $i=1,2,\ldots,L_{\phi}^{(\omega)}$ , if $\|\bm{x}^{(\omega+1)}[i]\|_{2}^{2}$ is smaller than some fixed threshold $\mu$ , then we delete $\bm{x}^{(\omega+1)}[i]$ from the vector $\bm{x}^{(\omega+1)}$ , and correspondingly delete the angle $\phi_{i}^{(\omega+1)}$ from the vector $\bm{\phi}^{(\omega+1)}$ . We then set $L_{\phi}^{(\omega)}$ to be the length of the preserved $\bm{\phi}^{(\omega+1)}$ :

[TABLE]

where $(\cdot)_{*}$ represents the preserved vector at each iteration.

When the iterative algorithm satisfies its termination criterion, i.e., $\|{\bm{x}}^{(\omega+1)}-{\bm{x}}^{(\omega)}\|_{2}$ is less than some predefined $\eta$ , the number of paths $P$ can be estimated using the number of non-zero block values of $\bm{x}$ . The proposed algorithm is summarized as Algorithm 1.

III-C Delay Estimation Algorithm

We next consider the recovery of the delays of each path. Note that (II-C) implies that the delay factor $\tau_{p}^{\text{ul}}$ has a Vandermonde vector $\bm{b}(\tau_{p}^{\text{ul}})$ appearing in the expression for each row of $\bm{H}_{\mathcal{N}}^{\text{ul}}$ . Consequently, we can use the same block structure to estimate the delay by vectorizing ${{\bm{Y}}_{{\mathcal{N}}}}$ . Similarly, we use $\bm{\tau}\triangleq[\tau_{1},\tau_{2},\cdots,\tau_{L_{\tau}^{(0)}}]$ , as the off-grid delays to be estimated, where $\tau_{i}=\frac{i}{L_{\tau}^{(0)}Nf_{o}}$ , $i\in\{1,2,\cdots,{L_{\tau}^{(0)}}\}$ , and $L_{\tau}^{(0)}$ is the initial number of unknown delays. Thus, the sensing matrix for delay estimation can be formulated similarly to (28). To that aim, define

[TABLE]

where $\bm{d}_{t}({\tau_{i}})\triangleq\bm{b}(\tau_{i})\otimes\bm{I}_{M}$ . It can be readily checked that

[TABLE]

where $\bm{x}_{t}$ is an $ML_{\tau}^{(0)}\times 1$ block sparse vector defined similarly to the vector $\bm{x}$ introduced in the previous subsection.

For a fixed delay grid $\bm{\tau}$ , the delay estimation can be expressed as

[TABLE]

which can be solved in a similar manner as (III-B), namely, by iteratively updating the estimate of the vector $\bm{x}_{t}$ and the grid $\bm{\tau}$ using a block-sparsity boosting iterative reweighted least squares objective. The proposed delay estimation procedure and algorithm is summarized in Algorithm 2.

III-D Uplink Channel Reconstruction

In the previous subsections we showed how the estimations of the DOAs and the delays, denoted $\bm{\hat{\phi}}^{\text{ul}}$ and $\bm{\hat{\tau}}^{\text{ul}}$ are separately obtained, along with the number of paths $\hat{P}$ . Yet, one still has to match each DOA value to its corresponding delay, namely, to match the entries of $\bm{\hat{\phi}}^{\text{ul}}$ to their corresponding entries $\bm{\hat{\tau}}^{\text{ul}}$ . A trivial approach is to search all of the $\hat{P}^{\hat{P}}$ possible pairings among $\bm{\hat{\phi}}^{\text{ul}}$ and $\bm{\hat{\tau}}^{\text{ul}}$ , but this would incur a heavy computational burden. Fortunately, based on Theorem 1, if $\hat{\phi}_{i}^{\text{ul}}$ and $\hat{\tau}_{j}^{\text{ul}}$ belong to the same path, then the projection of their channel vector $\text{vec}(\bm{\Xi}(\hat{\phi}_{i}^{\text{ul}},\hat{\tau}_{j}^{\text{ul}}))$ onto $\text{vec}({\bm{Y}}_{{\mathcal{N}}})$ will be a larger value compared to other (mismatched) combination.

Consequently, we can try different combinations of $\text{vec}(\bm{\Xi}(\hat{\phi}_{i}^{\text{ul}},\hat{\tau}_{j}^{\text{ul}}))$ to perform the inner product operation with $\text{vec}({\bm{Y}}_{{\mathcal{N}}})$ , and select the maximum value from each operation as the correct matching of the angle $\hat{\phi}^{\text{ul}}$ for delay $\hat{\tau}^{\text{ul}}$ . Next, we delete the angle and delay that have been matched already, and perform the inner product for the remaining angles and delays. Under this pairing procedure, at most $\sum_{p=1}^{\hat{P}}p^{2}=\hat{P}^{2}$ inner product operations are required.

After the pairing process is concluded, one can recover the complex channel gains. To that aim, we stack the obtained channel vectors (with correct matching) $\textup{vec}(\bm{\Xi}(\hat{\phi}_{p}^{\text{ul}},\hat{\tau}_{p}^{\text{ul}}))$ , where $1\leq p\leq\hat{P}$ , as columns to form a matrix:

[TABLE]

Now, we note that if there is no error in the estimation of the DOAs and the delays, then the true channel gains can be computed via $\bm{\beta}=\bm{B}^{\dagger}\textup{vec}(\bm{H}^{\text{ul}})$ . However, as we do not have access to the true channel $\bm{H}^{\text{ul}}$ , we estimate the gains by applying the same transformation to the least-squares estimation of $\bm{H}^{\text{ul}}$ , i.e., $\bm{\bm{Y}}_{\mathcal{N}}(\bm{S}^{\text{ul}})^{-1}$ . Specifically, the uplink complex channel gains are estimated as

[TABLE]

The overall uplink channel at all $N$ subcarriers of the $k$ th user can be reconstructed from (II-C) as

[TABLE]

We summarize the overall proposed uplink channel reconstruction scheme as Algorithm 3.

IV Downlink Channel Estimation

In the previous section we proposed an algorithm for estimating the uplink channel. However, in order to establish reliable bi-directional communications, the downlink channel must also be estimated. One of the major challenges in massive MIMO communications stems from the fact that in FDD systems, in which different bands are assigned to uplink and downlink transmissions, the downlink channel cannot be immediately deduced from the uplink channel. In the following we show how, for mmWave massive MIMO systems with the BSE, the estimation scheme designed for uplink channels in the previous section can be extended to downlink channels by exploiting a phenomenon called angle-delay reciprocity.

In particular, we consider pilot-aided downlink channel estimation in which the BS transmits a-priori known pilot sequence in a similar manner to the uplink channel estimation phase. To deal with the BSE, we design the downlink estimation strategy to use dedicated pilots for each path and using different beamforming vectors for different subcarriers with respect to the given path. To present our scheme, we first elaborate on the structure of mmWave massive MIMO downlink channels in the presence of BSE in Subsection IV-A. Then, in Subsection IV-B, we discuss the angle-delay reciprocity, a property which we exploit in Subsection IV-C to generate the beamformed pilots and to formulate our downlink channel estimation algorithm.

IV-A Downlink Channel Structure

To formulate the downlink channel, we denote its center frequency $f_{c}^{\text{dl}}$ and wavelength $\lambda^{\text{dl}}={c}/{f_{c}^{\text{dl}}}$ . The downlink array steering vector is expressed as

[TABLE]

where $\phi^{\text{dl}}\triangleq\frac{2\pi d\cdot\sin\theta^{\text{dl}}}{\lambda^{\text{dl}}}$ is the normalized direction of departure (DOD), and $\theta^{\text{dl}}$ is the downlink DOD. We henceforth use the term “DOD” to refer to the normalized DOD $\bm{\phi}^{\text{dl}}$ .

Similar to the uplink case, at frequency $f$ , the downlink channel observed by the $k$ th user can be written as:

[TABLE]

where $P^{\text{dl}}$ is the number of downlink paths, $\bar{\beta}_{p}^{\text{dl}}$ and $\tau_{p}^{\text{dl}}$ are the complex gain and multi-path delay of the $p$ th downlink path, respectively. The channel parameters, $P^{\text{dl}}$ , $\bar{\beta}_{p}^{\text{dl}}$ , $\tau_{p}^{\text{dl}}$ , and $\phi_{p}^{\text{dl}}$ depend on the specific user index $k$ , which is omitted for brevity, as in the previous section.

As in (11) and (II-C), we define :

[TABLE]

Then, the downlink $1\times MN$ frequency channel vector from the BS to the $k$ th user is given by

[TABLE]

where $\bm{\beta}^{\text{dl}}=[\bar{\beta}_{1}^{\text{dl}},\bar{\beta}_{2}^{\text{dl}},\ldots,\bar{\beta}_{P^{\text{dl}}}^{\text{dl}}]^{T}$ is the downlink channel complex gain vector, and

[TABLE]

From (IV-A), the downlink channel consists of a set of DODs, delays, and complex gains, and thus has a similar structure as the uplink case. Note that in FDD systems, channel reciprocity does not hold, and the channel must be estimated independently by each user. However, this estimation can be facilitated by accounting for the angle-delay reciprocity of mmWave channels, presented in the following subsection.

IV-B Angle-Delay Reciprocity

Unlike in TDD systems [37], FDD channels are not reciprocal, namely, downlink and uplink transmissions undergo different channels due to their different frequency bands. However, since the propagation paths of electromagnetic waves are reciprocal, only the signal wave that physically reverses the uplink path can reach users during downlink transmission. It has been shown in [38, 39, 40] that the conductivity and relative permittivity of most materials remain unchanged if the frequency of the electromagnetic wave does not vary much, say less than 1GHz. Hence, the angle components of the uplink and downlink channels commonly are the same in mmWave communications [18]. Moreover, since the downlink electromagnetic wave travels the same distance as the uplink, the delay components for the uplink and downlink channel are the same. This phenomenon is commonly referred to as angle-delay reciprocity [20], and it implies that

[TABLE]

From (57), the uplink and downlink channels have the same number of paths as well as the same angle and delay parameters, which can be estimated at the BS. Therefore, to acquire the downlink channel in FDD systems, the users only need to estimate the remaining downlink channel gain and feedback this gain to the BS. The resulting computational overhead, as we show in the sequel, can be made affordable.

IV-C Downlink Channel Estimation

We now show how the angle-delay reciprocity can be exploited to estimate the downlink channel. Here, the BS estimates (57), using the number of uplink paths along with their DOAs and delays obtained via Algorithm 3.

Recall that the BS transmits beamformed pilots to each user during downlink channel estimation. In particular, the BS sends a-priori known pilots in each estimated path. Let $\bm{s}^{\text{dl}}_{p}\in\mathbb{C}^{1\times T}$ denote the pilots targeting the $p$ th path over $T$ subcarriers with index set $\mathcal{N}=\{n_{1},n_{2},\ldots,n_{T}\}$ . These pilots are orthogonal over the different paths, i.e., $\bm{s}^{\text{dl}}_{i}(\bm{s}^{\text{dl}}_{j})^{H}=\delta(i-j)$ . To formulate how these pilots are beamformed prior to their transmission, we use the channel structure in (IV-A) and formulate the columns of the matrix $\bm{\Xi}^{{}^{\mathcal{N}}}(\phi_{p}^{\text{dl}},\tau_{p}^{\text{dl}})$ as follows: for the $n_{q}$ th ( $q=1,2,\cdots,T$ ) pilot subcarrier, the corresponding column of $\bm{\Xi}^{\mathcal{N}}(\phi_{p}^{\text{dl}},\tau_{p}^{\text{dl}})$ is given by

[TABLE]

where $\bm{b}_{n_{q}}^{T}(\tau_{p}^{\text{ul}})$ is the $n_{q}$ th element of $\bm{b}^{T}(\tau_{p}^{\text{dl}})$ , and $[\bm{W}^{\text{dl}}(\phi_{p}^{\text{ul}})]_{:,{n_{q}}}$ is the $n_{q}$ th column of $\bm{W}^{\text{dl}}(\phi_{p}^{\text{ul}})$ . By letting $\bm{F}_{{n_{q}}}^{p}\big{(}\phi_{p}^{\text{dl}},\tau_{p}^{\text{dl}}\big{)}$ be the beamforming vector for the $k$ th user from the $p$ th path on the $n_{q}$ th pilot carrier, the corresponding channel output (prior to the addition of noise) can be written as

[TABLE]

In order to point the downlink beam to the $p$ th path, we set the beamforming vector to be

[TABLE]

The ${1\times T}$ received vector from all $T$ pilot carriers can now be written as

[TABLE]

where $\bm{H}_{{\mathcal{N}}}^{\text{dl}}\triangleq\sum_{p=1}^{P}\big{[}{\bar{\beta}}_{p}^{\text{dl}}\textup{vec}(\bm{\Xi}^{\mathcal{N}}(\phi_{p}^{\text{dl}},\tau_{p}^{\text{dl}}))\big{]}^{H}$ is the ${1\times MT}$ downlink channel over the $T$ pilot carriers, $\bm{S}^{\text{dl}}=[(\bm{s}^{\text{dl}}_{1})^{T},(\bm{s}^{\text{dl}}_{2})^{T},\cdots,(\bm{s}^{\text{dl}}_{\hat{P}})^{T}]^{T}$ is the pilot matrix, and $\bm{F}_{\mathcal{N}}(\bm{\phi}^{\text{dl}},\bm{\tau}^{\text{dl}})$ is the beamforming matrix given by

[TABLE]

Due to the beamforming matrix in (62), the user can use the a-priori knowledge of $\bm{S}^{\text{dl}}$ to recover its downlink channel complex gain vector using simple least-squares estimation:

[TABLE]

To allow the BS to recover the complete downlink channel, each user now feedbacks its estimated gain vector $\bm{\hat{\beta}}^{\text{dl}}$ to the BS, completing the downlink channel reconstruction via:

[TABLE]

We summarize the downlink channel reconstruction scheme as Algorithm 4.

V Simulations

In this section, we demonstrate the effectiveness of the proposed algorithms for uplink and downlink channel estimation compared to conventional channel estimation algorithms. In particular, we show that our approach significantly outperforms previously proposed methods, which either restrict the solution set to a finite grid, as in [16, 17], or, alternatively, do not take into account the BSE. We also compare our off-grid recovery schemes to conventional grid refinement [32].

We consider a BS equipped with a ULA with element spacing $d=\lambda^{\text{ul}}/2$ . All $K=8$ users are randomly distributed in the service area and each has a single antenna. The pilots are uniformly distributed over all the $N=64$ subcarriers, thus each user is assigned $T=8$ pilot subcarriers. The transmit bandwidth is $1$ GHz with uplink center frequency $f_{c}=60$ GHz and downlink center frequency $f_{c}=61$ GHz. For the on-grid approach, we take $L=1024$ grid points. For the proposed off-grid approach, we correspondingly set $L_{\phi}=1024$ as the initial resolution. The simulated mmWave channels are generated via (12) with $P=6$ , $\bar{\beta}_{p}\sim\mathcal{CN}(0,1)$ , $\phi_{p}\sim\mathcal{U}({-\pi},{\pi})$ , and $\tau_{p}\sim\mathcal{U}(0,\frac{1}{Nf_{0}})$ , where $\mathcal{CN}$ and $\mathcal{U}$ represent the complex normal and the uniform distribution, respectively. The signal-to-noise ratio (SNR) is defined as $\sigma_{p}^{2}/\sigma_{n}^{2}$ , where $\sigma_{p}^{2}$ is the pilot power. The performance of angle and delay estimation are measured by the corresponding mean-square error (MSE) values, defined as

[TABLE]

respectively. Here, $J=1000$ is the number of Monte-Carlo trials. The channel estimation performance is measured in terms of the normalized mean square error (NMSE):

[TABLE]

Fig. 4 shows the uplink $\text{MSE}_{\bm{\phi}}$ versus SNR of the proposed algorithm compared with on-grid CS approach [17], grid refinement approach [32] and conventional channel modeling [36] that ignores the BSE. While the MVV structure is not considered in these algorithms [17, 32, 36], in the following we allow the competing algorithms to exploit this structure in order to maintain a fair comparison, Observing Fig. 4, we note that the curve for the MSE in recovering the DOAs of the proposed algorithm decreases linearly as SNR increases (indicating that $\text{MSE}_{\bm{\phi}}$ decays exponentially with increasing SNR), while all the other methods meet error floors at high SNR. This error floor exhibited by previously proposed estimators, which emphasizes the benefits of our proposed algorithms in high SNR values, is a result of a model mismatch which can be attributed to:

For the on-grid algorithms, the grid mismatch restricts the resolution of DOA estimation to be $2\pi/L$ ;
For the grid refinement algorithm, the error floor occurs due to convergence to local optimum points for some of the parameters, as mentioned in Section III.B.
For the off-grid algorithm, neglecting the BSE significantly decreases channel estimation accuracy. It is also observed from the second and the third curves in Fig. 4 that the degradation due to ignoring the BSE is more substantial compared to grid mismatch.

Fig. 5 depicts the MSE in recovering the delays for the same setup. Since the BSE does not influence the delay estimation, we only present the results for the on-grid, grid refinement, and the proposed off-grid approaches. Similarly to Fig. 4, the on-grid and grid refinement methods inevitably encounter an error floor at high SNRs, while the proposed off-grid method consistently improves performance.

Next, we compare the NMSE in recovering the uplink channel using our proposed Algorithm 3. The results are depicted in Fig. 6. It is observed that the NMSE curve for the proposed algorithm decreases linearly as the SNR increases, achieving significantly better performance than competing methods. Furthermore, the channel estimation accuracy of the other four techniques all meet error floors at high SNR, in correspondence with their $\text{MSE}_{\bm{\phi}}$ and $\text{MSE}_{\bm{\tau}}$ performance.

We next study the effect of bandwidth on the performance of our estimators. The SNR is set to 10dB. Fig. 7 depicts the MSE in estimating the DOAs for uplink mmWave massive MIMO channels of the proposed algorithm compared to off-grid approaches [36] that either ignore the BSE or do not utilize the MMV structure under various transmission bandwidths. It is observed in Fig. 7 that the proposed algorithm achieves the best estimation accuracy and that its superiority over previously proposed estimators is consistent for various bandwidths. When the bandwidth is as small as $20$ MHz, it is noted that the off-grid approach considering the MMV structure but ignoring the BSE has the same performance as our proposed method. This is because the BSE is not pronounced when the bandwidth is small. However, as the bandwidth increases, the performance of the algorithms which ignore the BSE quickly deteriorates. Furthermore, the algorithm that utilizes the MMV structure but does not consider the BSE performs worse than the proposed one, and the performance gap remains constant for different bandwidth values. This demonstrates that properly exploiting the block sparsity can improve the performance and this improvement is not affected by bandwidth.

Fig. 8 displays the uplink MSE in recovering the delays with the same system setup in Fig. 7. Because the BSE does not affect the delay estimation, the MSE in estimating the path delays for both algorithms remains constant as the bandwidth increases. Similarly to Fig. 7, the proposed algorithm still achieves superior performance for all bandwidth values.

Fig. 9 compares the NMSE of the reconstructed uplink channel via Algorithm 3 with the competing approaches under various bandwidths. As the bandwidth increases, we observe that the NMSE of the proposed algorithm remains constant and yields the best performance over all the techniques, settling with the results depicted in Figs. 7-8.

In the final example, we evaluate the NMSE of the reconstructed channel for both the uplink and downlink with different number of antennas in Fig. 10. Observing Fig. 10, we note that the proposed algorithm is very effective in estimating the downlink channel and achieves a consistent improvement in performance as SNR increases. Nevertheless, the NMSE in recovering the downlink channel is always worse than that of the uplink channel because the angle and delay parameters of the downlink channel originated from their uplink counterparts and may include estimation errors. These errors are combined with the estimation errors which arise due to the presence of noisy observations. Yet, as the number of antennas $M$ increases, the channel recovery performance of both the uplink and downlink improves accordingly, indicating the potential benefits of our channel estimators for massive MIMO systems.

VI Conclusions

In this paper, we designed channel estimation algorithms under a non-negligible beam squint effect in mmWave massive MIMO systems. We first showed that the recovery of the DOAs and delays of each channel path can be represented as an MMV problem using a shift invariant transformation, and developed an algorithm for recovering an off-grid estimate of these parameters. We then showed how these recovered values can be used to estimate the overall downlink channel. By exploiting the angle-delay reciprocity of mmWave channels, we extended the results derived for uplink channel estimation to a computationally efficient approach with low overhead for downlink channel estimation in FDD systems. Compared to previously proposed channel estimators, which either adopted an on grid approach or, alternatively, did not account for the beam squint modeling, the proposed algorithms provide significantly better performance. Numerical simulation results demonstrated the effectiveness of the proposed techniques, and have shown that properly taking the BSE into account is critical for mmWave massive MIMO systems.

In this appendix, we show how the derivative of $v(\bm{\phi})$ in (41) can be computed. The derivative of $v_{t}(\bm{\tau})$ in Algorithm 2 is obtained in a similar fashion and is thus omitted for brevity. From (III-B),

[TABLE]

Define:

[TABLE]

Based on the chain rule, the first derivative of $v(\bm{\phi})$ with respect to $\phi_{i}$ can be computed as

[TABLE]

where

[TABLE]

and

[TABLE]

where

[TABLE]

$\bm{0}_{MT,L_{\phi}}$ is a zero matrix with dimension of $MT\times L_{\phi}$ , and $\bm{D}(\phi_{i})$ is defined in (III-A).

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5G be?,” IEEE J. Sel. Areas Commun. , vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
2[2] M. Xiao, S. Mumtaz, Y. Huang, L. Dai, Y. Li, M. Matthaiou, G. K. Karagiannidis, E. Bj o ¨ ¨ o \ddot{\text{o}} rnson, K. Yang, I. Chih-Lin, and A. Ghosh, “Millimeter wave communications for future mobile networks,” IEEE J. Sel. Areas Commun. , vol. 35, no. 9, pp. 1909–1935, Sep. 2017.
3[3] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antenna,” IEEE Trans. Wireless Commun. , vol. 9, no. 11, pp. 3950–3600, Nov. 2010.
4[4] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?”. IEEE J. Sel. Areas Commun. , vol. 31, no. 2, pp. 160–171, Feb. 2013.
5[5] N. Shlezinger and Y. C. Eldar, ”On the spectral efficiency of noncooperative uplink massive MIMO system,” IEEE Trans. Commun. , early access, 2018.
6[6] S. Jin, X. Liang, K. K. Wong, X. Gao, and Q. Zhu, “Ergodic rate analysis for multipair massive MIMO two-way relay networks,” IEEE Trans. Wireless Commun. , vol. 14, no. 3, pp. 1480–1491, Mar. 2015.
7[7] Q. Zhang, S. Jin, K. K. Wong, H. Zhu, and M. Matthaiou, “Power scaling of uplink massive MIMO systems with arbitrary-rank channel means,” IEEE J. Sel. Topics Signal Process. , vol. 8, no. 5, pp. 966–981, May 2014.
8[8] T. S. Rappaport, G. R. Mac Cartney, M. K. Samimi, and S. Sun, “Wideband millimeter-wave propagation measurements and channel models for future wireless communication system design,” IEEE Trans. Commun. , vol. 63, no. 9, pp. 3029–3056, Sep. 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Block Sparsity Based Estimator for mmWave Massive MIMO Channels with Beam Squint

Abstract

Index Terms:

I Introduction

II Channel Model

II-A mmWave Massive MIMO Uplink Channel Model

II-B Beam Squint Effect in OFDM Systems

II-C Channel Characteristics due to the BSE

Theorem** 1**

III Uplink Channel Estimation

III-A Sparse Representation

III-B Off-grid DOA Estimation Algorithm

Definition 1

III-C Delay Estimation Algorithm

III-D Uplink Channel Reconstruction

IV Downlink Channel Estimation

IV-A Downlink Channel Structure

IV-B Angle-Delay Reciprocity

IV-C *Downlink Channel Estimation *

V Simulations

VI Conclusions

Theorem 1

IV-C Downlink Channel Estimation