Algebraic Channel Estimation Algorithms for FDD Massive MIMO systems

Cheng Qian; Xiao Fu; and Nicholas D. Sidiropoulos

arXiv:1903.08938·eess.SP·October 23, 2019·IEEE J. Sel. Top. Signal Process.

Algebraic Channel Estimation Algorithms for FDD Massive MIMO systems

Cheng Qian, Xiao Fu, and Nicholas D. Sidiropoulos

PDF

TL;DR

This paper introduces an algebraic tensor-based channel estimation method for FDD massive MIMO systems that achieves accurate results with minimal training overhead, even when the number of paths exceeds the number of antennas.

Contribution

It proposes a novel algebraic framework using Vandermonde tensor algebra and a special training sequence to enable efficient, real-time channel estimation in challenging scenarios.

Findings

01

Effective channel estimation with small training overhead.

02

Handles more paths than antennas, surpassing traditional methods.

03

Lightweight algebraic operations enable real-time implementation.

Abstract

We consider downlink (DL) channel estimation for frequency division duplex based massive MIMO systems under the multipath model. Our goal is to provide fast and accurate channel estimation from a small amount of DL training overhead. Prior art tackles this problem using compressive sensing or classic array processing techniques (e.g., ESPRIT and MUSIC). However, these methods have challenges in some scenarios, e.g., when the number of paths is greater than the number of receive antennas. Tensor factorization methods can also be used to handle such challenging cases, but it is hard to solve the associated optimization problems. In this work, we propose an efficient channel estimation framework to circumvent such difficulties. Specifically, a structural training sequence that imposes a tensor structure on the received signal is proposed. We show that with such a training sequence, the…

Figures9

Click any figure to enlarge with its caption.

Equations162

Y

Y

\displaystyle\mathbf{H}=\mathbf{A}_{r}\mathrm{diag}\big{(}\boldsymbol{\beta}\big{)}\mathbf{A}_{t}^{H}\in\mathbb{C}^{M_{r}\times M_{t}}

\displaystyle\mathbf{H}=\mathbf{A}_{r}\mathrm{diag}\big{(}\boldsymbol{\beta}\big{)}\mathbf{A}_{t}^{H}\in\mathbb{C}^{M_{r}\times M_{t}}

A_{r}

A_{r}

A_{t}

β

a_{r, k} = [1 e^{j ω_{r, k}} \dots e^{j (M_{r} - 1) ω_{r, k}}]^{T}

a_{r, k} = [1 e^{j ω_{r, k}} \dots e^{j (M_{r} - 1) ω_{r, k}}]^{T}

a_{t, k} = a_{y, k} \otimes a_{x, k}

a_{t, k} = a_{y, k} \otimes a_{x, k}

\hat{H} = Y S^{H} = A_{r} diag (β) (A_{y} ⊙ A_{x})^{H} + N S^{H}

\hat{H} = Y S^{H} = A_{r} diag (β) (A_{y} ⊙ A_{x})^{H} + N S^{H}

S S^{H} \neq = I_{M_{t}},

S S^{H} \neq = I_{M_{t}},

H_{LS} = HS S^{H} \neq = H

H_{LS} = HS S^{H} \neq = H

X

X

\displaystyle:=\left\llbracket\mathbf{A},\mathbf{B},\mathbf{C}\right\rrbracket

\overset{˘}{U}_{n} = U_{n} Π Ξ_{n}, \forall n = 1, \dots, N

\overset{˘}{U}_{n} = U_{n} Π Ξ_{n}, \forall n = 1, \dots, N

\displaystyle F\leq\min\Big{(}(I_{1}-1)J,~{}I_{2}K\Big{)}

\displaystyle F\leq\min\Big{(}(I_{1}-1)J,~{}I_{2}K\Big{)}

{I_{1}, I_{2}} = ar g {I_{1}, I_{2}} \in Z^{+} max

{I_{1}, I_{2}} = ar g {I_{1}, I_{2}} \in Z^{+} max

S = S_{y} \otimes S_{x}

S = S_{y} \otimes S_{x}

Y

Y

= Δ B_{r} (C_{y} ⊙ C_{x})^{H}

\displaystyle\boldsymbol{\mathcal{Y}}=\sum_{k=1}^{K}\mathbf{b}_{r,k}\circ\mathbf{c}_{x,k}^{*}\circ\mathbf{c}_{y,k}^{*}=\left\llbracket\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\right\rrbracket.

\displaystyle\boldsymbol{\mathcal{Y}}=\sum_{k=1}^{K}\mathbf{b}_{r,k}\circ\mathbf{c}_{x,k}^{*}\circ\mathbf{c}_{y,k}^{*}=\left\llbracket\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\right\rrbracket.

a = [1 e^{j ω} \dots e^{j (M - 1) ω}]^{T}

a = [1 e^{j ω} \dots e^{j (M - 1) ω}]^{T}

\overline{s}_{l} = [s_{1} s_{2} \dots s_{M + 1 - l} 0_{l - 1}]^{T}, \forall l \geq 1

\overline{s}_{l} = [s_{1} s_{2} \dots s_{M + 1 - l} 0_{l - 1}]^{T}, \forall l \geq 1

\overline{s}_{l}^{H} a = m = 1 \sum M + 1 - l s_{m}^{*} e^{j (m - 1) ω}

\overline{s}_{l}^{H} a = m = 1 \sum M + 1 - l s_{m}^{*} e^{j (m - 1) ω}

\underline{s}_{l} = [s_{M + 1 - l}^{*} \dots s_{2}^{*} s_{1}^{*} 0_{l - 1}]^{T}

\underline{s}_{l} = [s_{M + 1 - l}^{*} \dots s_{2}^{*} s_{1}^{*} 0_{l - 1}]^{T}

\underline{s}_{l}^{H} a

\underline{s}_{l}^{H} a

= m = 1 \sum M (s_{m}^{*} e^{j (m - 1) ω})^{*} e^{j (M - l) ω}

= (\overline{s}_{l}^{H} a)^{*} a_{M + 1 - l}

a_{M + 1 - l} = \frac{s _{l}^{H} a}{( s _{l}^{H} a ) ^{*}} .

a_{M + 1 - l} = \frac{s _{l}^{H} a}{( s _{l}^{H} a ) ^{*}} .

\overline{S}

\overline{S}

\underline{S}

\underline{S}^{H} a = (\overline{S}^{H} a)^{*} ⊛ v \in C^{L}

\underline{S}^{H} a = (\overline{S}^{H} a)^{*} ⊛ v \in C^{L}

\hat{v} = (diag (\overline{S}^{H} a)^{*})^{- 1} (\underline{S}^{H} a)

\hat{v} = (diag (\overline{S}^{H} a)^{*})^{- 1} (\underline{S}^{H} a)

\overset{ω}{^} = ∠ ([\hat{v}]_{1 : L - 1}^{H} [\hat{v}]_{2 : L}) .

\overset{ω}{^} = ∠ ([\hat{v}]_{1 : L - 1}^{H} [\hat{v}]_{2 : L}) .

S_{x}

S_{x}

= [\overline{s}_{x, L} \dots \overline{s}_{x, 1} \underline{s}_{x, L} \dots \underline{s}_{x, 1}] \in C^{M_{x} \times N_{x}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Algebraic Channel Estimation Algorithms for FDD Massive MIMO systems

Cheng Qian, Member, IEEE, Xiao Fu, Member, IEEE, and Nikolaos D Sidiropoulos, Fellow, IEEE

Conference version of part of this work has been submitted to IEEE SPAWC 2019 [1]. This work was supported in part by the National Science Foundation under project NSF ECCS 1808159 and NSF ECCS 1608961. C. Qian and N. D. Sidiropoulos are with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904 USA (e-mail: [email protected], [email protected]).X. Fu is with the School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331 ([email protected]).

Abstract

We consider downlink (DL) channel estimation for frequency division duplex based massive MIMO systems under the multipath model. Our goal is to provide fast and accurate channel estimation from a small amount of DL training overhead. Prior art tackles this problem using compressive sensing or classic array processing techniques (e.g., ESPRIT and MUSIC). However, these methods have challenges in some scenarios, e.g., when the number of paths is greater than the number of receive antennas. Tensor factorization methods can also be used to handle such challenging cases, but it is hard to solve the associated optimization problems. In this work, we propose an efficient channel estimation framework to circumvent such difficulties. Specifically, a structural training sequence that imposes a tensor structure on the received signal is proposed. We show that with such a training sequence, the parameters of DL MIMO channels can be provably identified even when the number of paths largely exceeds the number of receive antennas—under very small training overhead. Our approach is a judicious combination of Vandermonde tensor algebra and a carefully designed conjugate-invariant training sequence. Unlike existing tensor-based channel estimation methods that involve hard optimization problems, the proposed approach consists of very lightweight algebraic operations, and thus real-time implementation is within reach. Simulation results are carried out to showcase the effectiveness of the proposed methods.

Index Terms:

Channel estimation, massive MIMO, training sequence design, tensor factorization, low-complexity.

I Introduction

Massive MIMO promises significant performance gains in terms of spectral efficiency, reliability and security over the existing communication systems [2, 3]. However, realizing many of these advantages in practice hinges on accurate estimation of the channel state information (CSI), which affects the performance of transmit beamforming at the transmitters and decoding accuracy at the receiver.

Previously, much attention has been devoted to the time division duplex (TDD) protocol, where channel reciprocity can be invoked to estimate the downlink (DL) CSI from uplink (UL) training. However, this convenient property does not hold under the frequency division duplex (FDD) protocol, where UL and DL channels are in different frequency bands, with generally different propagation characteristics. Hence, the DL channel is different from the UL one, and it must be estimated by the receiver and then fed back to the transmitter. On the other hand, FDD offers uninterrupted full-duplex transmission, and relaxed amplification and synchronization requirements which are critical factors affecting service and deployment costs. Hence it is of great interest to come up with lightweight training and feedback strategies that require few resources.

To alleviate the heavy burden of DL training and UL feedback, one possible way is to reduce the effective channel parameters by considering a specular multipath channel comprising a few dominant paths, each characterized by direction-of-arrival (DOA), direction-of-departure (DOD) and channel gain [3, 4, 5, 6]. Such a channel model is effective under certain conditions, e.g., when the base station (BS) antenna array is mounted on top of a tall building or cellular tower, such that the number of local scatterers is limited. In addition, when the carrier frequency is lifted to the millimeter wave regime, due to the severe path loss, only a few specular reflections reach the other end of the link [3, 7, 8]. Thus, the channel tends to exhibit a sparse structure in the angular domain. This allows for channel modeling using only DOA, DOD and channel gain. Under this model, the channel estimation problem for uniform transmit/receive arrays is related to multidimensional harmonic retrieval problems in classical array processing which have been well-studied in the past few decades [9]. Array processing algorithms (such as maximum likelihood [10] and subspace based approaches [11, 12]) can be employed to estimate multipath parameters. These methods are good fit for TDD systems but not for FDD. The reason is that array processing methods require a large array aperture for parameter estimation, where the array size should be greater than the number of paths in general—which is relatively easy to be satisfied in UL channels since the BS typically has many more antennas than the mobile station (MS), especially in massive MIMO scenarios. However, in FDD systems, the DL and UL channels have to be estimated separately. When the number of DL paths is larger than the number of receive antennas at the MS, conventional array processing methods will not work [13, 5].

The above problem might be tackled by using compressive sensing (CS) methods. With a limited number of paths, the channel exhibits a sparse pattern in the angle domain, and thus channel estimation can be recast as a sparse regression problem [8, 14, 15, 16, 17, 18, 19, 20]. Many CS based algorithms have been developed. The authors of [8] employed CS for channel estimation and proved that if both BS and MS are equipped with uniform linear arrays (ULAs), a MIMO channel with dimension $M_{r}\times M_{t}$ can be recovered from $\mathcal{O}(K\log(M_{r}M_{t}/K))$ training samples with high probability, where $K$ is the number of paths, and $M_{r}$ and $M_{t}$ are the number of antennas at MS and BS, respectively. Since then a series of CS based techniques were proposed to enhance the channel estimation performance such as [16, 15]. The CS-based approach is elegant and works to some extent, but it also faces some challenges. CS-based methods rely on a discretized angle dictionary for parameter estimation, which is usually a very ‘fat’ matrix with many coherent columns. This may lead to unsatisfactory performance for sparse recovery. Since DOAs/DODs are continuous variables in space, how to alleviate the performance loss caused by angle discretization is another crucial issue. One way is to use gradient descent [21] or Newton’s method [22, 19, 18] to refine the angles. But this involves additional optimization and increases the complexity.

There are also matrix completion (MC) techniques employed for multipath channel estimation, e.g., [23, 24]. Fang et al., [24] employed MC to solve the channel estimation problem of a millimeter wave system with a single radio frequency chain, where they assumed that the number of dominant paths is much smaller than the number of the transmit and receive antennas. This method requires multiple communications between the BS and MS to collect enough data over time to form a low-rank data matrix. Such a protocol implicitly assumes that MS and scatterers remain static. Furthermore, the overall training overhead is still high and solving the MC problem is a non-trivial task in terms of computational complexity. Note that in the current FDD systems, the MS never communicates with the BS for DL channel estimation; instead, the BS acts more like a radio station and only broadcasts training sequences and its basic service information.

Another way to reduce the computational burden on the mobile end is to exploit the so-called spatial reciprocity [18, 19, 25, 26]. In this line of work, it is assumed that the UL and DL channels share the same propagation paths, and thus UL channel estimation yields important information for the DL channel as well. In this way, the DL estimation burden is shifted to the base station, which is anyway responsible for estimating the UL channel(s). This approach requires that the UL and DL operate on close-by carrier frequencies over similar bandwidths. The challenge is that in many FDD systems, the DL channel can have a much wider bandwidth using multiple carriers in different bands, whereas the UL channel is usually on a single carrier/band. This causes a wide frequency separation (e.g., 1 GHz) between the DL and UL channels, which may then exhibit very different propagation characteristics111See 5G UL and DL frequency allocations in https://www.everythingrf.com/community/5g-nr-new-radio-frequency-bands and https://en.wikipedia.org/wiki/5G_NR_frequency_bands..

In [5], an FDD massive MIMO system was considered with both BS and MS equipped with dual-polarized antennas. It has been shown that there is a hidden tensor structure in the received training data, and effective tensor factorization algorithms were proposed to estimate the multipath parameters. However, the techniques and parameter identifiability results therein are enabled by the special structure of dual-polarized multipath channels—how to generalize the technique to handle general multipath channels is unclear. In addition, the method in [5] is realized using computationally heavy optimization algorithms, which may not be realistic for mobile phones whose computational power is rather limited.

In this paper, we consider parameter estimation for general specular multipath channels. We aim to provide effective estimation schemes that entail very low DL training overhead and low complexity. Our detailed contributions are summarized as follows:

•

Short Training Sequence Design

We propose a new training sequence with a conjugate symmetric structure for FDD massive MIMO systems. As we will see, our judicious design enables simple and effective channel estimation with very low training overhead.
•

Low complexity Algorithm

We show that by using the proposed training sequence, the received data can be transformed to a low-rank tensor, and thus channel estimation can be recast as low-rank tensor decomposition. Two simple algebraic methods are then devised for channel estimation.
•

Identifiability Analysis

We analyze the multipath parameter identification problem for massive MIMO. We show that under mild conditions, all parameters of the channel are identifiable using the proposed training sequence and algorithms.

A short conference version of this work has been submitted to the IEEE SPAWC 2019 workshop [1]. This journal version includes a more advanced and accurate estimation method, fleshed out analysis, and more comprehensive experiments.

The remainder of this paper is organized as follows. In Section II we describe the signal model and multipath channel estimation problem. The major contributions of the paper appear in Sections III and IV; the former explains the design of a novel training sequence for frugal DL training, and the latter presents a computationally efficient channel estimation algorithm which incorporates the designed training sequence. Identifiability results are also provided in Section IV. Section V presents an improved channel estimator which incorporates the method proposed in Section IV and a root-finding technique to achieve higher estimation accuracy. Section VI presents our simulation results, and Section VII summarizes our conclusions.

Notation: Throughout the paper, superscripts $(\cdot)^{T}$ , $(\cdot)^{*}$ , $(\cdot)^{H}$ , $(\cdot)^{-1}$ and $(\cdot)^{\dagger}$ represent transpose, complex conjugate, Hermitian transpose, matrix inverse and pseudo inverse, respectively. We use $|\cdot|$ , $\|\cdot\|_{F}$ , $\|\cdot\|_{2}$ and $\|\cdot\|_{1}$ for absolute value, Frobenius norm, $\ell_{2}$ -norm and $\ell_{1}$ -norm, respectively; $\hat{a}$ denotes an estimate of $a$ , $\text{diag}(\cdot)$ is a diagonal matrix holding the argument in its diagonal, $\text{vec}(\cdot)$ is the vectorization operator and $\angle(\cdot)$ takes the phase of its argument; $[\cdot]_{i}$ is the $i$ th element of a vector, $[\mathbf{S}]_{i,j}$ is the $(i,j)$ entry of $\mathbf{S}$ , and $\mathbf{s}_{r,k}$ is the $k$ th column of $\mathbf{S}_{r}$ . Symbols $\otimes,\odot,\circledast\text{ and }\circ$ denote the Kronecker, Khatri-Rao, element-wise, and outer products, respectively; $[\mathbf{S}]_{i:j,m:n}$ extracts the elements in rows $i$ to $j$ and columns $m$ to $n$ , $[\mathbf{S}]_{:,i:j}$ extracts the elements in the columns $i$ to $j$ and $[\mathbf{S}]_{i:j,:}$ extracts the elements in the rows $i$ to $j$ . $\mathbf{I}_{m}$ is the $m\times m$ identity matrix and $\boldsymbol{0}_{m\times n}$ is the $m\times n$ zero matrix.

II Signal Model and Problem Statement

II-A Channel Model

We consider the DL of a FDD massive MIMO system, where a BS with $M_{t}$ transmit antennas sends signals to the MS that is equipped with $M_{r}$ receive antennas. After collecting $N$ temporal samples, the received data matrix at the MS is

[TABLE]

where $\mathbf{H}\in\mathbb{C}^{M_{r}\times M_{t}}$ is the DL channel matrix, $\mathbf{S}\in\mathbb{C}^{M_{t}\times N}$ is the training signal matrix, $\mathbf{N}\in\mathbb{C}^{M_{r}\times N}$ is i.i.d. circularly symmetric complex Gaussian noise with mean zero and covariance $\sigma^{2}\mathbf{I}_{M_{r}}$ . When the BS employs a transmit array with many antennas and the carrier frequency goes to 60 GHz, it is reasonable to assume that there are a few scatterers between the transmitter and receiver [3, 6]. Under this assumption, the channel $\mathbf{H}$ is modeled as

[TABLE]

where

[TABLE]

In the above, $K$ is the number of paths, $\beta_{k}$ , $\mathbf{a}_{t}(\theta_{t,k},\phi_{t,k})$ and $\mathbf{a}_{r}(\theta_{r,k},\phi_{r,k})$ denote the gain, transmit and receive steering vectors of the $k$ th path, respectively, where $\{\theta_{r,k},\phi_{r,k}\}$ are the azimuth and elevation angles of DOA and $\{\theta_{t,K},\phi_{t,K}\}$ are the azimuth and elevation angles of DOD. We assume that the BS is equipped with an $M_{x}\times M_{y}$ element uniform rectangular array (URA), and the MS has a small uniform linear array (ULA) with $M_{r}$ antennas. In this case, the total number of transmit antennas is $M_{t}=M_{x}M_{y}$ . The $k$ th steering vector of the MS is

[TABLE]

and the steering vector at the BS is

[TABLE]

where $\omega_{r,k}=2\pi d\sin(\theta_{r,k})/\nu$ , $[\mathbf{a}_{x,k}]_{l_{x}}=e^{j\omega_{x,k}},l_{x}=0,\cdots,M_{x}-1$ and $[\mathbf{a}_{y,k}]_{l_{y}}=e^{j\omega_{y,k}},l_{y}=0,\cdots,M_{y}-1$ with $\omega_{x,k}=2\pi(l_{x}-1)d\sin(\phi_{t,K})\cos(\theta_{t,K})/\nu$ and $\omega_{y,k}=2\pi(l_{y}-1)d\sin(\phi_{t,K})\sin(\theta_{t,K})/\nu$ . Here, $\nu$ is the wavelength, and $d$ is the inter-element spacing distance between two adjacent antennas, which is assumed to be smaller than or equal to half-wavelength.

II-B Problem Statement and challenges

In an FDD system, the DL and UL channels are operated in different frequency bands, so the MS must estimate the DL channel first and then feed it back to the BS through a low-rate UL channel, where the number of feedback bits is limited. If the dimension of the channel is large, it is impractical to feed back the whole channel matrix. A more practical and economical way is to estimate and feed back the key parameters such as DOAs, DODs and path-losses that characterize the DL channel.

In practice, if the training sequences are orthogonal and both receive and transmit antennas are ULAs/URAs, the problem of estimating multipath parameters actually belongs to a class of multidimensional harmonic retrieval problems, which has been well-studied during the past few decades [27, 28, 29, 22]. To be specific, when $\mathbf{S}\mathbf{S}^{\mathbf{H}}=\mathbf{I}_{M_{t}}$ , one can first estimate $\mathbf{H}$ via

[TABLE]

which is a 3-D harmonic retrieval (HR) model [27, 28, 29, 30, 31, 32]. Then, the key parameters can be estimated from $\hat{\mathbf{H}}$ via various approaches such as [27, 28, 29, 30, 31, 32, 33], even when $K\gg M_{r}$ . However, the 3d-HR approach is computationally expensive, and using an orthogonal $\mathbf{S}$ means that $N\geq M_{t}$ has to be satisfied. When the number of transmit antennas is large, this inevitably leads to high training overhead—which is undesired in massive MIMO systems, especially under mobility, where agile channel estimation is need. When $N<M_{t}$ and the training signal $\mathbf{S}$ is non-orthogonal, i.e.,

[TABLE]

the matched filtering output

[TABLE]

is no longer a good approximation of the original channel, even without any noise. Under such circumstances, estimating the channel parameters becomes very challenging, and only a few cases are known to be resolvable. One major challenge is identifiability. Based on the existing identifiability results for array processing [13, 34], given $\mathbf{Y}$ and an unstructured $\mathbf{S}$ , the number of paths that we can handle is about $(M_{r}-1)$ . In other words, once $K\geq M_{r}$ which is the case in practical scenarios, the channel parameters may not be identifiable. Even if $K<M_{r}$ , conventional array processing methods can only identify $\mathbf{A}_{t}^{H}\mathbf{S}$ instead of $\mathbf{A}_{t}$ , but how to efficiently estimate the DODs from this term is unclear. In [5], an iterative optimization algorithm was proposed to estimate the DODs from a similar term, but the complexity of the algorithm may be too high for a practical commercial smart phone. When $K\geq M_{r}$ , one may adopt CS based methods to estimate multipath parameters [8, 15, 16]. However, sparse methods also face serious challenges. Specifically, discretizing the angular space leads to sub-optimality and solving a large-scale sparse optimization with a semi-coherent dictionary is a challenge for practical implementation.

III Training Sequence Design

This work consists of two components for channel estimation: training sequence design and channel parameter estimation. Fig. 1 shows the block diagram of the proposed system. In this section, we will discuss the first part–training sequence design, which is critical for the subsequent channel estimation. We propose to design a “tall” training matrix $\mathbf{S}$ which has certain structure to overcome the difficulties mentioned above.

III-A Tensor Preliminaries

To make the paper self-contained, we briefly present the definition of tensor rank and some useful theorems on the uniqueness of tensor decomposition in the following.

Definition 1

(Canonical Polyadic Decomposition (CPD)). A tensor is a multidimensional array indexed by three or more indices. Specifically, an third order tensor $\boldsymbol{\mathcal{X}}\in\mathbb{C}^{I\times J\times K}$ that has three latent factor matrices $\{\mathbf{A},\mathbf{B},\mathbf{C}\}$ can be written as

[TABLE]

where $\mathbf{A}\in\mathbb{C}^{I\times F}$ and $[\mathbf{A}]_{:,f}\circ[\mathbf{B}]_{:,f}\circ[\mathbf{C}]_{:,f}$ is a rank-1 tensor. The minimal such $F$ is the rank of tensor $\boldsymbol{\mathcal{X}}$ or the CPD rank of $\boldsymbol{\mathcal{X}}$ [35].

Definition 2

*(Unfolding). Tensor unfolding is obtained by taking the mode- $n$ slabs of the tensor (i.e., subtensors obtained by fixing the $n$ th index of the original tensor), vectorizing the slabs, and then stacking all the vectors from left to the right into a matrix *

CPD factors $\boldsymbol{\mathcal{X}}$ into a sum of rank-one tensors. It is known that the CPD is unique under mild conditions, up to scaling and permutation of the $F$ components. This is referred to as “essential uniqueness” of the latent factors in the literature, and formally defined as follows.

Definition 3

(Uniqueness). Given a $N$ -th order tensor $\boldsymbol{\mathcal{X}}=\llbracket\mathbf{U}_{1},\cdots,\mathbf{U}_{N}\rrbracket$ of rank $F$ , its CPD is essentially unique if the rank-one terms in the decomposition are unique, i.e., there is no other way to decompose $\boldsymbol{\mathcal{X}}$ for the given number of rank-1 terms. If $\boldsymbol{\mathcal{X}}=\llbracket\breve{\mathbf{U}}_{1},\cdots,\breve{\mathbf{U}}_{N}\rrbracket$ , for some $\{\breve{\mathbf{U}}_{n}\}_{n=1}^{N}$ , then there exists a permutation matrix $\boldsymbol{\Pi}$ and diagonal matrices $\{\mathbf{\Xi}_{n}\}_{n=1}^{N}$ such that

[TABLE]

where $\prod_{n=1}^{N}\mathbf{\Xi}_{n}=\mathbf{I}_{F}$ .

In some special cases, the factor matrices have special structure, e.g., they are Vandermonde. With this prior information, one can show stronger identifiability result. For example,

Theorem 1

[31]** Consider a third-order tensor $\boldsymbol{\mathcal{X}}=\llbracket\mathbf{A},\mathbf{B},\mathbf{C}\rrbracket$ , where $\mathbf{A}\in\mathbb{C}^{I\times F}$ , $\mathbf{B}\in\mathbb{C}^{J\times F}$ , $\mathbf{C}\in\mathbb{C}^{K\times F}$ , $\mathbf{A}$ is Vandermonde with distinct nonzero generators. Assume that $\mathbf{B}$ and $\mathbf{C}$ are drawn from an absolutely continuous distribution. If

[TABLE]

where $I_{1}\geq I_{2}$ and $I_{2}=I+1-I_{1}$ are chosen from

[TABLE]

then $\mathbf{A}$ , $\mathbf{B}$ and $\mathbf{C}$ are essentially unique with probability one.

III-B Conjugate Flipped Structure

Our idea is to use a Vandermonde structure-enabled algebraic tensor factorization algorithm to recover an $M_{r}\times M_{t}$ channel matrix from the received signal matrix $\mathbf{Y}$ with dimension $M_{r}\times N$ , where $N<M_{t}$ . Then we use a simple algorithm to recover the DOAs and DODs. Both steps are enabled via a judiciously designed training sequence. Let us first show how to transform $\mathbf{Y}$ to a tensor. It follows from $(\mathbf{A}\otimes\mathbf{B})^{H}(\mathbf{C}\odot\mathbf{D})=(\mathbf{A}^{H}\mathbf{C})\odot(\mathbf{B}^{H}\mathbf{D})$ that by defining

[TABLE]

$\mathbf{Y}$ can be written as

[TABLE]

where $\mathbf{B}_{r}=\mathbf{A}_{r}\mathrm{diag}\left(\boldsymbol{\beta}\right)$ with its $k$ th column being $\mathbf{b}_{r,k}=\beta_{k}\mathbf{a}_{r,k}$ , $\mathbf{S}_{x}\in\mathbb{C}^{M_{x}\times N_{x}}$ , $\mathbf{S}_{y}\in\mathbb{C}^{M_{y}\times N_{y}}$ , $\mathbf{C}_{x}=\mathbf{S}_{x}^{H}\mathbf{A}_{x}\in\mathbb{C}^{N_{x}\times K}$ and $\mathbf{C}_{y}=\mathbf{S}_{y}^{H}\mathbf{A}_{y}\in\mathbb{C}^{N_{y}\times K}$ with $N=N_{x}N_{y}$ . Note that the scalar $\beta_{k}$ does not hurt the Vandermonde structure in $\mathbf{a}_{r,k}$ , so $\mathbf{B}_{r}$ is Vandermonde. According to Definitions 1 and 2, $\mathbf{Y}$ in (III-B) is the matrix form of a third-order tensor with rank $K$ defined as

[TABLE]

Before we continue, it is necessary to note that the essential uniqueness of tensor factorization makes the latent factors of a tensor identifiable under mild conditions [35]. In our case, the latent factors are $\{\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\}$ and they are identifiable from $\boldsymbol{\mathcal{Y}}$ up to column permutation and scaling ambiguity under some conditions. As we will see in Section IV, with the Vandermonde $\mathbf{B}_{r}$ , these factor matrices can be efficiently identified by computing singular-value decomposition (SVD) of a small dimensional matrix, and hence avoiding the complicated optimization procedure as conventional tensor decomposition approaches do. However, our target is not the factor matrices but the angles and path-losses contained therein. The estimation of DOAs is relatively simple because $\mathbf{B}_{r}$ is Vandermonde and we can estimate DOAs from the columns of $\mathbf{B}_{r}$ . The difficulty here is the estimation of DODs and path-losses, where the former are contained in $\mathbf{C}_{x}$ and $\mathbf{C}_{y}$ while the latter are not even identifiable from standard tensor factorization approaches. In the following, we will show that by designing a specially structured training matrix $\mathbf{S}$ , all the multipath parameters are identifiable from a simple algebraic method with identifiability guarantees.

Assume that we have already identified $\mathbf{C}_{x}$ and $\mathbf{C}_{y}$ . The remaining task is to identify azimuth and elevation angles from $\mathbf{C}_{x}$ and $\mathbf{C}_{y}$ . By definition, $\mathbf{C}_{x}=\mathbf{S}_{x}^{H}\mathbf{A}_{x}$ and $\mathbf{C}_{y}=\mathbf{S}_{y}^{H}\mathbf{A}_{y}$ , so the designs of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ are the same. To simplify the analysis, let us temporally remove the subscripts $x$ and $y$ , and consider the design of $\mathbf{S}$ for an $M$ -element ULA that has the steering vector as

[TABLE]

with $\omega\in\left[\,-\pi,\pi\,\right]$ . Assume that there is a training signal defined as

[TABLE]

where $s_{i}$ is the symbol transmitted by the $i$ th antenna element. The inner product between $\mathbf{a}$ and $\overline{\mathbf{s}}_{l}$ is

[TABLE]

Taking the conjugate of $\overline{\mathbf{s}}_{l}$ and then flipping its nonzero elements yields

[TABLE]

which leads to

[TABLE]

where $a_{M+1-l}$ is the $(M+1-l)$ th element of $\mathbf{a}$ . The above equation exhibits a “conjugate” rotational invariance (CRI) between $\overline{\mathbf{s}}_{l}^{H}\mathbf{a}$ and $\underline{\mathbf{s}}_{l}^{H}\mathbf{a}$ , which is slightly different from the standard rotational invariance (as in, e.g., ESPRIT [12]). The latter is usually built upon the forward and backward subarrays that are rotationally invariant by a factor of $e^{j\omega}$ while CRI does not rely on subarrays and its invariant factor is the $(M+1-l)$ th element of the steering vector, i.e., $e^{j(M-l)\omega}$ . Note that the insight of constructing CRI appeared in MIMO radar beamparttern design [36], where a type of nonzero conjugately flipped waveform has been studied.

Let us rewrite (III-B) as

[TABLE]

We see that it provides a way for estimating the phase contained in $a_{M+1-l}$ , i.e., $(M-l)\omega$ which contains the target $\omega$ . Nevertheless, for large $M$ and small $l$ , $(M-l)\omega$ can be greater than $2\pi$ , causing the so-called phase wrapping problem. Thus, we cannot find the exact $\omega$ from (10). This is also the problem of [36]. To solve the phase wrapping problem, we need at least two adjacent elements of $\mathbf{a}$ since $\mathbf{a}$ is Vandermonde. In (10), we have shown that a pair of $\overline{\mathbf{s}}_{l}$ and $\underline{\mathbf{s}}_{l}$ extracts the $(M+1-l)$ th element of $\mathbf{a}$ . Provided that there exists another pair denoted by $\{\overline{\mathbf{s}}_{l+1},\underline{\mathbf{s}}_{l+1}\}$ that takes the $(M-l)$ element of $\mathbf{a}$ out, we may obtain a Vandermonde vector which consists of $a_{M+1-l}$ and $a_{M-l}$ , such that the phase $\omega$ can be estimated exactly through $\angle(a_{M+1-l}/a_{M-l})$ . Based on this observation, we vary $l$ from 1 to $L$ and collect all $\{\overline{\mathbf{s}}_{l}\}$ and $\{\underline{\mathbf{s}}_{l}\}$ in, respectively,

[TABLE]

Then the following equality holds

[TABLE]

where $\mathbf{v}=\left[\,\mathbf{a}\,\right]_{M+1-L:M}=\left[\,a_{M+1-L}~{}\cdots~{}a_{M}\,\right]^{T}$ contains the last $L$ elements of $\mathbf{a}$ , and hence it is Vandermonde.

The estimation of $\omega$ is now much easier. In practice, once $\underline{\mathbf{S}}^{H}\mathbf{a}$ and $\overline{\mathbf{S}}^{H}\mathbf{a}$ are identified222We will see later on how to identify them., we first estimate

[TABLE]

and then calculate the phase $\omega$ through

[TABLE]

III-C Design of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$

Now let us return to the design of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ . We first talk about the design of $\mathbf{S}_{x}$ . To implement the above idea, the training signal must contain both $\overline{\mathbf{S}}$ and $\underline{\mathbf{S}}$ . One possible choice for $\mathbf{S}_{x}$ is

[TABLE]

where $N_{x}=2L$ , $\overline{\mathbf{s}}_{x,l}$ and $\underline{\mathbf{s}}_{x,l}$ are defined in (7) and (8), respectively; $\overline{\mathbf{S}}_{x}$ and $\underline{\mathbf{S}}_{x}$ have the same definitions as (11a) and (11b), respectively, and both of them have $L$ columns.

In Section III-B, we show that the estimation of $\omega$ is only related to the spatial structure of $\overline{\mathbf{S}}$ and $\underline{\mathbf{S}}$ but not their values. This provides more freedom on choosing the values for $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ . However, random $\overline{\mathbf{S}}$ and $\underline{\mathbf{S}}$ only guarantee the recovery of the phase but not the path loss. In order to estimate all the key parameters efficiently, we need one more constraint on choosing the values of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ . We enforce the last elements in $\overline{\mathbf{s}}_{x,1}$ and $\overline{\mathbf{s}}_{y,1}$ to satisfy

[TABLE]

Due to the conjugate symmetric property between $\overline{\mathbf{S}}$ and $\underline{\mathbf{S}}$ , we also have $[\underline{\mathbf{s}}_{x,1}]_{1}[\underline{\mathbf{s}}_{y,1}]_{1}=1$ .

It is instructive to showcase the structure of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ by examples. Let us consider a setting where $M_{x}=M_{y}=5$ and $N_{x}=N_{y}=4$ . In this case, $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ are

[TABLE]

where the first two columns in $\mathbf{S}_{x}$ are $\overline{\mathbf{S}}_{x}$ and the last two columns are $\underline{\mathbf{S}}_{x}$ ; similar to $\mathbf{S}_{y}$ . When $N_{x}\neq N_{y}$ , for example $M_{x}=5,M_{y}=4$ , we may choose

[TABLE]

Remark 1

It is seen from the above examples that to construct $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ , we only need to generate $\max(M_{x},M_{y})$ different symbols $\{s_{i}\}$ . The minimum $L$ that guarantees the recovery of $\omega$ is 2, meaning that the minimum number of columns in $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ is $N_{x}=N_{y}=2L=4$ . This also indicates that the minimum number of training samples for the above design is $N=N_{x}N_{y}=16$ .

IV Channel Estimation

In this section, we derive a computationally efficient channel estimator. We first explain the details on how to efficiently estimate the factor matrices $\{\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\}$ in (6). Then we derive closed-form solutions for multipath parameter estimation. Finally, we claim uniqueness condition for the identification of these parameters.

IV-A Identification of Factor Matrices

According to Definition 2, the matrix unfolding of $\boldsymbol{\mathcal{Y}}$ along its third dimension takes the form of

[TABLE]

Since $\mathbf{B}_{r}$ is Vandermonde, the spatial smoothing technique is applicable to further expand the dimension of $\mathbf{Y}_{(3)}$ . Specifically, defining a cyclic selection matrix $\mathbf{J}_{i_{2}}=\mathbf{I}_{M_{x}}\otimes[\boldsymbol{0}_{P_{r}\times i_{2}}~{}\mathbf{I}_{P_{r}}~{}\boldsymbol{0}_{P_{r}\times(M_{r}-i_{2}-P_{r})}]$ and varying $i_{2}$ from 0 to $(Q_{r}-1)$ , we have

[TABLE]

where $P_{r}+Q_{r}=M_{r}+1$ .

Since $\mathbf{B}_{1},\mathbf{B}_{2}$ are Vandermonde, given $\{P_{r},Q_{r}\}$ , we can follow [31, 5, 37] and employ an ESPRIT-like approach shown in Algorithm 1 to estimate $\mathbf{B}_{r}$ , $\mathbf{C}_{x}$ and $\mathbf{C}_{y}$ .

IV-B DOA/DOD Estimation

The factor matrices identified from Algorithm 1 suffer column permutation and scaling ambiguity, implying that the estimates of $\{\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\}$ are not exactly the original factors. Fortunately, this will not be an issue for angle estimation. Since the columns in $\{\mathbf{B}_{r},\mathbf{C}_{x},\mathbf{C}_{y}\}$ are paired with each other and the scaling ambiguity does not affect the array manifold structure, we can estimate the $k$ th DOA and DOD from the $k$ th columns of $\mathbf{B}_{r}$ , $\mathbf{C}_{x}$ and $\mathbf{C}_{y}$ , respectively.

Due to the one-by-one mapping between $\theta_{r,k}$ and $\omega_{r,k}$ , estimating DOA is equivalent to estimate the phase $\omega_{r,k}$ that is calculated as

[TABLE]

It is optional to estimate DOAs from

[TABLE]

The estimation of azimuth and elevation angles of DOD is different from DOA estimation due to the presence of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ . To estimate them, it is necessary to know the phases contained in $\mathbf{A}_{x}$ and $\mathbf{A}_{y}$ . Toward this end, let us consider the estimation of the phase $\omega_{x,k}$ . Let $\hat{\overline{\mathbf{C}}}_{x}$ and $\hat{\underline{\mathbf{C}}}_{x}$ be submatrices of $\hat{\mathbf{C}}_{x}$ which contain the first and last half of the rows of $\hat{\mathbf{C}}_{x}$ , respectively. According to the definition of $\mathbf{S}_{x}$ in (III-C), in the noiseless case, the explicit expressions for $\overline{\mathbf{C}}_{x}$ and $\underline{\mathbf{C}}_{x}$ are

[TABLE]

It follows from (13) that the top $N_{x}/2$ rows of the $k$ th column in $\mathbf{A}_{x}$ equal to

[TABLE]

The corresponding phase is then calculated via

[TABLE]

Similarly, by splitting $\hat{\mathbf{C}}_{y}$ into $\overline{\mathbf{C}}_{y}$ and $\underline{\mathbf{C}}_{y}$ , we have

[TABLE]

where $\hat{\mathbf{v}}_{y,k}=\mathrm{diag}\left(\hat{\overline{\mathbf{c}}}_{y,k}^{*}\right)^{-1}\hat{\underline{\mathbf{c}}}_{y,k}$ . It is optional to estimate the azimuth and elevation angles of DOD. If interested, we calculate them through

[TABLE]

IV-C Path-loss Estimation

The path-loss $\boldsymbol{\beta}$ is merged into the factor matrices. Unlike DOA/DOD estimation which is insensitive to the column scaling ambiguity, the estimation of $\boldsymbol{\beta}$ is seriously affected by such an ambiguity.

To estimate $\boldsymbol{\beta}$ , let us find the explicit expression for estimating $\boldsymbol{\beta}$ . Note that the column permutation in $\{\hat{\mathbf{B}}_{r},\hat{\mathbf{C}}_{x},\hat{\mathbf{C}}_{y}\}$ is not an issue, let us ignore it and consider only the scaling ambiguity to simplify the analysis. The reconstruction of $\mathbf{Y}=\mathbf{H}\mathbf{S}$ from the estimated factor matrices is given by

[TABLE]

Since $\mathbf{H}\mathbf{S}=\mathbf{A}_{r}\mathrm{diag}(\boldsymbol{\beta})\left(\mathbf{A}_{y}\odot\mathbf{A}_{x}\right)^{H}\mathbf{S}$ , by comparing it with (IV-C), we find

[TABLE]

where $\mathbf{\Xi}_{r}=\mathrm{diag}([\xi_{r,1},\cdots,\xi_{r,K}])$ is the scaling ambiguity matrix corresponding to $\mathbf{A}_{r}$ with its $i$ th diagonal entry being

[TABLE]

In the above, the scaling ambiguity corresponding to $\mathbf{A}_{r}$ is estimated as

[TABLE]

where $\hat{\mathbf{a}}_{r,k}$ is constructed using $\hat{\omega}_{x,k}$ . The only unknown in (30) is $\hat{\xi}_{y,k}^{*}\hat{\xi}_{x,k}^{*}$ . We propose an efficient forward-backward average method to calculate it which only involves element-wise multiplication/division.

IV-C1 Forward Way

First, let us consider the forward way. Let

[TABLE]

be the $N_{i}/2$ and $(N_{i}/2-1)$ elements in $[\overline{\mathbf{C}}_{i}]_{:,k}$ , respectively.

In the presence of scaling ambiguity, $\hat{\overline{\mathbf{C}}}_{i}$ is expressed as

[TABLE]

where ${\mathbf{\Xi}}_{i}=\mathrm{diag}([\xi_{i,1},\cdots,\xi_{i,K}])$ contains $K$ scaling ambiguities with $\xi_{i,k}$ standing for the ambiguity between $[\hat{\overline{\mathbf{C}}}_{i}]_{:,k}$ and $[{\overline{\mathbf{C}}}_{i}]_{:,k}$ . The estimates of $\overline{e}_{i,k}$ and $\overline{f}_{i,k}$ now become

[TABLE]

and

[TABLE]

It follows that

[TABLE]

where $[\mathbf{a}_{i,k}]_{M_{i}}$ is replaced by its estimate $e^{j(M_{i}-1)\hat{\omega}_{i,k}}$ . Then we have

[TABLE]

where the last equality is due to (16).

IV-C2 Backward Way

To maximize the information usage, for example, we may also calculate $\xi_{y,k}^{*}\xi_{x,k}^{*}$ from the $(N_{i}-1)$ th and $N_{i}$ th rows of $[\underline{\mathbf{C}}_{i}]_{:,k}$ . The derivations are mostly the same as the forward way. The only difference is that $\underline{\mathbf{s}}_{i,k}$ is conjugate flipped from $\overline{\mathbf{s}}_{i,k}$ . We have the following relationship

[TABLE]

where $\underline{e}_{i,k}=\underline{\mathbf{s}}_{i,1}^{H}\mathbf{a}_{i,k}$ and $\underline{f}_{i,k}=\underline{\mathbf{s}}_{i,2}^{H}\mathbf{a}_{i,k}$ . Following the analysis in (32)–(35) yields

[TABLE]

Then based on (16), we have

[TABLE]

Finally, we substitute the average of (IV-C1) and (37) into (30) for final path-loss estimation.

Our method contains two main procedures: tensor decomposition and multipath parameter estimation. Both of them exploit the rotational invariance property which exists in the Vandermonde manifold matrices. Because of this reason, we name our method as rotationally invariant channel estimation (RICE) algorithm. Its detailed steps are summarized in Algorithm 2.

IV-D Identifiability Analysis

The last remaining question is identifiability, i.e., how many paths we can handle given the measurements in (1). Recall that the dimension of $\mathbf{Z}$ in (IV-A) is a function of $P_{r}$ and $Q_{r}$ . Since $N_{x}$ and $N_{y}$ are fixed, by tuning $P_{r}$ and $Q_{r}$ , we are able to find an optimal pair of $\{P_{r},Q_{r}\}$ such that the number of paths that our method is capable to cope with is maximized. Based on Theorem 1 [31], we have the following result.

Theorem 2

Assume that the DOAs and DODs in different paths are not identical, i.e., $\theta_{r,i}\neq\theta_{r,j},\theta_{t,i}\neq\theta_{t,j},\phi_{t,i}\neq\phi_{t,j},\forall i\neq j$ , and all the path-losses are jointly drawn from an absolutely continuous distribution. Then, given the measurements $\mathbf{Y}$ , all the multipath parameters are uniquely identifiable with probability one if

[TABLE]

where $P_{r}$ and $Q_{r}$ are chosen from

[TABLE]

The proof of Theorem 2 is constructive, following the steps of Algorithm 1, meaning that this bound is achievable. In practice, once $M_{r}$ and $N$ are chosen, we first find the optimal $\{P_{r},Q_{r}\}$ by solving (39) and then cache them in the system to guarantee the identifiability. We note that the minimum $N$ is 16. If we choose $M_{r}=N_{x}=N_{y}=4$ , our method can uniquely identify up to eight paths, while the standard array processing methods can only handle three paths.

IV-E Complexity Analysis

The computational complexity for the proposed method mainly lies in Algorithm 1, where the SVD of $\tilde{\mathbf{Z}}$ in Step 2 costs about $\mathcal{O}(P_{r}^{2}N_{x}^{2}Q_{r}N_{y})$ flops. Since $P_{r}+Q_{r}=M_{r}+1$ and $M_{r}$ is usually no more than four according to nowadays technology, we have $P_{r}^{2}Q_{r}\approx 18\approx M_{r}^{2}$ . On the other hand, because of the fact $N=N_{x}N_{y}$ , by setting $N_{x}=N_{y}=\sqrt{N}$ , the complexity of Step 2 becomes $\mathcal{O}(M_{r}^{2}N^{1.5})$ . After the factor matrices are obtained, the estimation of DOAs, DODs and path-losses are very simple. The DOA estimation in (21) requires about $\mathcal{O}(M_{r}K)$ flops. The estimation of azimuth and elevation angles in (24)–(27) costs about $\mathcal{O}(LK)$ flops where $L=N^{1/4}$ . In many cases $L=2$ is enough to achieve satisfactory performance. Thus, $\mathcal{O}(LK)\approx\mathcal{O}(K)$ . The calculation of $\boldsymbol{\beta}$ costs $\mathcal{O}(M_{r}K+K)$ flops. The overall complexity of the proposed method is $\mathcal{O}(M_{r}^{2}N^{1.5}+2M_{r}K+3K)$ , which is quite low compared to the sparse regression methods such as orthogonal pursuit (OMP) that requires $\mathcal{O}(M_{r}NK2^{21})$ flops when DOD and DOA are quantized with $7$ bits.

Remark 2

The main advantages of the proposed method are its low-complexity and identifiability guarantees. As far as we know, there are very few uniqueness results available for MIMO channel estimation in the case of $N\ll M_{t}$ , i.e., when there are less training samples than the number of transmit antennas. And there is a serious lack of efficient and reliable channel estimation algorithms to handle such difficult cases especially when $M_{r}$ is small and $K$ is relatively large. In [5], we considered similar cases for a special type of MIMO systems with dual-polarized antennas, where we analyzed the identifiability and proposed algorithms for channel estimation. Unfortunately, the results in [5] are only valid for dual-polarized MIMO and hence, cannot be applied here. Therefore, the results in this paper are much more general—and also timely and meaningful as 5G system trials are beginning to roll out. It is worth highlighting that the proposed method can be generalized to the dual-polarized systems and similar closed-form solutions for multipath parameter estimation can be derived in a straightforward way.

V Exploiting full knowledge of ${\bf S}$

The RICE algorithm is based on the spatial structure of the training sequence for channel estimation but not the values. This means that only partial information of the training sequence has been used in RICE. Because of this reason, the complexity of RICE is maintained at a very low level, but at the expense of losing resolution ability in DOD estimation. In this section, we show that by fully utilizing the information in the training sequence, DODs can be estimated through a root-finding technique, and the performance of RICE can be further improved with a moderate increase of complexity. Toward this end, a joint RICE and Root-finding approach is developed. We name the new algorithm RICER.

In RICER, the estimation of DOA is the same as RICE, i.e., (21). The difference is in the estimation of DOD and path-loss. First, let us consider the estimation of $\omega_{x,k}$ from $\mathbf{c}_{x,k}=\mathbf{S}_{x}^{H}\mathbf{a}_{x,k}$ . Before we proceed, we claim that with the following Corollary, $\omega_{x,k}$ is identifiable from $\mathbf{c}_{x,k}$ .

Corollary 1

[5]** Assume that $M,N\geq 2$ , and that $\mathbf{A}\in\mathbb{C}^{M\times F}$ is Vandermonde with distinct generators, i.e., $\omega_{i}\neq\omega_{j}$ for $i\neq j$ . Then, $\omega$ can be uniquely identified almost surely from the system $\mathbf{C}=\mathbf{Q}^{H}\mathbf{A}(\omega)\mathrm{diag}(\boldsymbol{\xi})$ , where $\boldsymbol{\xi}$ stands for column scaling and the elements in $\mathbf{Q}\in\mathbb{C}^{M\times N}$ are jointly drawn from an absolutely continuous distribution.

We note that in the absence of noise,

[TABLE]

The above is interesting, indicating that although $\mathbf{S}_{x}^{H}$ is fat, its null space does not contain any Vandermonde vector. The above also says that the orthogonal complement of $\mathbf{c}_{x,k}$ is orthogonal to $\mathbf{S}_{x}^{H}\mathbf{a}_{x,k}$ . Thus, we have

[TABLE]

Define

[TABLE]

Eq. (41) is then equivalent to

[TABLE]

In the noisy case, the equality in (43) approximately holds true. We can then estimate $\omega_{x,k}$ via

[TABLE]

which is similar to the cost of the classical MUSIC algorithm [38]. Thus, we may search the phase from $-\pi$ to $\pi$ and report the one that minimizes the objective function in (44). The drawback is the complexity caused by the 1-D angular search which is approximately $\mathcal{O}(M_{x}^{2}D)$ flops for each $\omega_{x,k}$ , where $D$ is the number of bins dividing $[-\pi,\pi]$ . We may also derive a gradient descent method to handle (44) [5]. But due to the non-linearity and non-convexity in (44), optimizing the phase requires careful initialization which is not easy to be acquired in an efficient way.

Here, we employ a root-finding technique to estimate (24). Since $\mathbf{a}_{x,k}$ is Vandermonde, we further express (43) as a polynomial:

[TABLE]

where

[TABLE]

There are totally $(2M_{x}-2)$ roots after solving (45). The symmetric property of $\mathbf{P}_{x,k}^{\perp}$ implies that half of the roots are inside the unit circle while another half are outside, and they appear in conjugate-reciprocal pairs. In other words, the outer roots are inverses of the inner roots. We are only interested in the inner roots, i.e., those inside the unit circle. Let us denote them as $\mathcal{Z}=\{z_{i}\mid|z_{i}|\leq 1,i=1,\cdots,M_{x}-1\}$ . The next problem is to judiciously select one from $\mathcal{Z}$ for estimating $\omega_{x,k}$ . It is well motivated from the philosophy of root-MUSIC [11] that one may find a root $z$ from $\mathcal{Z}$ that is closest to the unit circle. However, when SNR is low or sample size is small, the signal subspace $\mathbf{c}_{x,k}$ might be heavily corrupted by a portion of noise subspace which causes the subspace leakage problem [39]. Once this happens, the root-MUSIC rule—selecting the root that is closest to the unit circle—is problematic; some irrelevant roots from the orthogonal complement of $\mathbf{c}_{x,k}$ may be much closer to the unit circle than the true root. Such phenomenon happens oftentimes when we estimate $\omega_{x,k}$ since $\hat{\mathbf{c}}_{x,k}$ is noisy and the available degrees-of-freedom are only $N_{x}\approx 4$ . Provided that the wrong root is selected, we can never reconstruct the channel correctly. Therefore, it is crucial to design a robust rule for final root determination.

One way to help alleviate the subspace leakage issue is as follows. Note that the RICE method relies more on the Vandermonde structure of $\mathbf{v}_{x,k}$ but not the subspace $\mathbf{c}_{x,k}$ , so RICE is robust for subspace leakage. As a result, we can use the estimate of $\omega_{x,k}$ from RICE for assistance. Specifically, let us calculate the phases in $z_{i}$ as $\psi_{i}=\angle(z_{i}),\forall i=1,\cdots,M_{x}-1$ . Let $\tilde{\omega}_{x,k}$ denote the estimate of $\omega_{x,k}$ from RICE, i.e., (24). We select one from $\{\psi_{i}\}$ that is closest to $\tilde{\omega}_{x,k}$ as the final estimate of $\omega_{x,k}$ .

Following the same way, we can calculate $\hat{\omega}_{y,k}$ from $\mathbf{c}_{y,k}$ . Finally, we update the path-loss as

[TABLE]

where $\hat{\mathbf{A}}_{i}$ is constructed from $\{\hat{\omega}_{i,1},\cdots,\hat{\omega}_{i,K}\},~{}i=r,x,y$ which are obtained from RICER. The detained steps for RICER are provided in Algorithm 3.

The identifiability of RICE and RICER is basically the same since both methods rely on Algorithm 1 for tensor decomposition. Their main difference lies in the estimation of DOD and path-loss. RICER uses the aid of RICE for determining $\omega_{x,k}$ and $\omega_{y,k}$ . Thus, its complexity is higher than RICE. We have $2K$ phases in total. The related complexity for solving $2K$ polynomials in (45) is $\mathcal{O}\left(8K(M_{x}^{2}\log(M_{x})+M_{y}^{2}\log(M_{y}))\right)$ flops. The complexity for updating the path-loss using (47) is $\mathcal{O}(M_{r}M_{x}M_{y}NK)$ flops. The total complexity for RICER is $\mathcal{O}\big{(}M_{r}^{2}N^{1.5}+8K(M_{x}^{2}\log(M_{x})+M_{y}^{2}\log(M_{y}))+M_{r}M_{x}M_{y}NK\big{)}$ flops which is much higher than the complexity of RICE. In the next section, we will see that by paying this additional complexity, RICER achieves better performance than RICE.

V-A Special Case: $M_{r}=1$

At this point, the reader might wonder whether the proposed framework can work in the case where only one antenna is available at the mobile end. The answer is affirmative. Let us first take a look at the signal model with a single receive antenna

[TABLE]

which can be reshaped into a matrix as

[TABLE]

We see that the tensor structure is no longer available in the received signal, and therefore uniqueness of the factor matrices seems to fail too. Since the RICE and RICER algorithms require the identification of $\mathbf{S}_{x}^{H}\mathbf{A}_{x}$ and $\mathbf{S}_{y}^{H}\mathbf{A}_{y}$ before performing parameter estimation, both of them will not work in the single antenna case. However, some further reflection shows that the RICER method can be modified for channel estimation even with a single receive antenna.

Note that when the training signals are orthogonal, channel estimation from (49) is indeed a 2-D harmonic retrieval problem, which is not our interest. Therefore, we only consider “tall” $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$ , i.e., $N_{x}<M_{x}$ and $N_{y}<M_{y}$ . Let $\tilde{\mathbf{U}}_{s}\in\mathbb{C}^{M_{x}\times K}$ be the signal subspace of $\tilde{\mathbf{Y}}$ . Similar to (40), we have

[TABLE]

Then the following equation holds

[TABLE]

where

[TABLE]

Owing to the Vandermonde structure, we can also employ the root-finding technique to estimate $\omega_{x,k},\forall k=1,\cdots,K$ by solving (45), where $\mathbf{P}_{x,k}^{\perp}$ is replaced by $\tilde{\mathbf{P}}_{x}^{\perp}$ . Then we pick the top $K$ roots inside of the unit circle and estimate $\omega_{x,k}$ as the phase of the $k$ th root. After that we use the estimates $\{\hat{\omega}_{x,k}\}$ to construct $\hat{\mathbf{A}}_{x}$ and calculate

[TABLE]

Following (40)–(45), we can find the estimates of $\{\hat{\omega}_{y,k}\}$ . Finally, estimate the path-losses as $\big{(}\mathbf{S}_{y}^{H}\hat{\mathbf{A}}_{y}\odot\mathbf{S}_{x}^{H}\hat{\mathbf{A}}_{x}\big{)}^{\dagger}\tilde{\mathbf{y}}$ .

Note that the channel parameters can be identified via the above procedures if $\mathbf{A}_{x}$ and $\mathbf{A}_{y}$ are full column rank and $K<\min(N_{x},N_{y})$ .

VI Simulations

In the simulation, we assume that the multipath propagation gains are Rician distributed. All the results are averaged over 500 Monte-Carlo trials using a computer with 3.7 GHz Intel Core i7-8700 and 32 GB RAM. The normalized mean square error (NMSE) of channel estimates is computed from $\mathrm{NMSE}=\frac{1}{500}\sum_{i=1}^{500}\|\hat{\mathbf{H}}_{i}-\mathbf{H}_{i}\|_{F}^{2}/\|\mathbf{H}_{i}\|_{F}^{2}$ where $\hat{\mathbf{H}}_{i}$ denotes the channel that is reconstructed from the estimated multipath parameters from the $i$ th Monte-Carlo trial. The estimation of $K$ is beyond the scope of this paper. So in the simulations, to be fair, we assume that $K$ is known to all the algorithms.

Given the model in (1), array processing methods fail to work when $K\geq M_{r}$ . The existing methods which are qualified to handle large $K$ might be the class of CS based methods [8, 15, 16]. We choose the OMP method for performance comparison since it is hyper-parameter free and computationally efficient. To implement OMP, we quantize $\theta_{r}$ , $\theta_{t}$ and $\phi_{t}$ using 7 bits, so the resulting dictionary has size $4N\times 2^{21}$ . We only consider “tall” $\mathbf{S}$ , i.e., there are less samples than transmit antennas. Hence, the LS technique does not work. We are interested in how well the new methods perform with a ‘tall’ $\mathbf{S}$ compared to the LS estimate with an orthogonal square $\mathbf{S}$ . The best achievable NMSE of the LS channel estimate from orthogonal training is $10^{-\mathrm{SNR}/10}$ , where SNR is in dB. We include this value as a performance benchmark.

In the beginning, we examine the identifiability of our methods. Let us consider the following parameter setting: $M_{r}=3$ , $M_{x}=M_{y}=10$ , $N_{x}=N_{y}=4$ and SNR $=20$ dB. We set the number of paths $K$ as the maximum number of identifiable paths calculated based on Theorem 2. Under this setting, we have $K=4$ . The phases $\{\omega_{r,k},\omega_{t,K},\omega_{t,K}\}$ are chosen as

[TABLE]

Fig. 2(a) plots the locations of each scatter based on their DOAs and DODs. It shows that RICE and RICER are able to resolve all the paths. Next we choose $M_{r}=4$ and SNR $=30$ dB. According to Theorem 2, our methods can deal with $K=8$ paths in theory. To verify this, we set

[TABLE]

The results are shown in Fig. 2(b), where our algorithms still work well. However, some angle estimates of RICE slightly disperse around the true angles, while those of RICER are more concentrated.

Now let us study the NMSE performance of the proposed methods. We first compare the NMSE performance by varying SNR from 0 to 20 dB. We set $M_{r}=4$ , $M_{x}=M_{y}=10$ , $K=4$ and $N_{x}=N_{y}=4$ . Simulation results are provided in Fig. 3, where both RICE and RICER outperform the OMP and benchmark throughout the range of SNRs considered. Note that the benchmark here uses a training signal of length 100 (vs. 16 for RICE and RICER), but does not exploit the DOA-DOD path parametrization – this is why RICE can beat this benchmark. Also note that OMP, which leverages the DOA-DOD path parametrization, does not work well due to the lack of enough samples and the coherence in the dictionary. RICER has better accuracy than RICE but it is shown in Fig. 4 that this is at the expense of paying four times more complexity. The additional calculation time is caused by the root finding procedure for DOD estimation and LS for path-loss estimation. Notably, RICE is 62 times faster than OMP and RICER is 12 times faster.

Similar results can also be found in Fig. 5, where the number of paths varies from 1 to 6 and SNR is fixed at 10 dB. $\{\omega_{r,k}\}$ are generated by uniformly dividing the range $(0.4\pi,1.6\pi)$ into $K$ intervals. $\{\omega_{x,k}\}$ and $\{\omega_{y,k}\}$ are generated in the same way but from the range $(0.4\pi,1.8\pi)$ and $(0.2\pi,1.6\pi)$ , respectively. Again, OMP does not work, likely owing to the coherence of the ‘flat’ dictionary matrix whose dimension is $64\times 2^{21}$ . RICE and RICER offer satisfactory performance for small $K$ . However, when $K$ exceeds 5, RICE becomes a bit inferior to the benchmark but still acceptable for such a difficult setting—recovering 5 paths from 16 training samples, which indicates that using full knowledge of $\mathbf{S}$ helps in achieving better estimation accuracy but at the expense of high complexity. Also we point out that according to Theorem 2, with the parameter settings of this example, our methods can resolve up to $K=6$ distinct paths. Therefore, the performance loss of RICE and RICER is due to the fact that the number of paths reaches the upper-bound that they can handle.

Next we evaluate the performance as a function of $M_{r}$ . We set $M_{x}=M_{y}=10$ , $N_{x}=N_{y}=6$ , SNR $=10$ dB, and vary $M_{r}$ from 2 to 7. At the same time, we increase the number of paths $K$ for each $M_{r}$ in the range $\{2,3,4,5,6,7\}$ , i.e., $K\in\{5,6,7,8,9,10\}$ . More specifically, if $M_{r}=2$ , $K=5$ , else if $M_{r}=3$ , $K=6$ , and so forth. DOAs, DODs and path-losses are generated in the same way as in Fig. 5. Fig. 6 shows the result, from which we can see that RICE and RICER work well. We find that in most cases, OMP can successfully resolve several paths but not all of them. It frequently mis-estimates one or two paths, which ultimately leads to unsatisfactory overall performance. Note that when $M_{r}=3$ , the maximum number of paths that the proposed methods can deal with is 6 which equals to the number of paths in this simulation. This validates the correctness of the identifiability analysis in Theorem 2.

In the last example, we examine the channel estimation performance by evaluating the bit error rate (BER) versus SNR. In the simulation, we first estimate the channel and then feed it back to the BS. Then we transmit QPSK symbols precoded using the zero-forcing precoding technique. We pass the coded signal through a white Gaussian channel and decode it at the MS. To simplify the analysis, we do not consider quantization error at the feedback step. The parameters are set as $M_{r}=3$ , $N_{x}=N_{y}=2$ and $M_{x}=M_{y}=10$ . According to Theorem 2, we can uniquely identify up to $4$ paths based on the setting. We consider two cases: $K=3$ and $K=4$ . The results are plotted in Fig. 7. Note that the benchmark curve is based on the least squares channel estimator with orthogonal pilots. We see that when $K=3$ , RICE and RICER outperform the benchmark and OMP. But the RICE algorithm performs worse than the benchmark when $K=4$ . We note that the benchmark achieving such performance is at the expense of huge training and feedback overhead, where the downlink training is based on a $100\times 100$ training signal and uplink feedback is with 600 real-valued numbers. However, the overhead of RICE and RICER is much lighter, where they only spend 16% of the training overhead of the benchmark and approximately 3.3% of the feedback overhead.

VII Conclusion

In this work, we designed a new non-orthogonal training sequence and proposed a novel tensor factorization framework to tackle the DL channel estimation problem for FDD massive MIMO from ‘frugal’ training. We showed that with the devised training sequence, the channel can be estimated accurately from a very small amount of training. Meanwhile, two computationally efficient algebraic methods were proposed for multipath parameter estimation. Compared to the existing approaches, the proposed methods have several advantages in terms of channel identification guarantees, estimation accuracy and computational complexity. Extensive simulations showed the effectiveness of the proposed methods. The most important take-away point is that RICE achieves similar or better performance than orthogonal training with a much shorter training sequence and using a computationally very attractive algebraic channel identification algorithm.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Qian, X. Fu, and N. D. Sidiropoulos, “A simple algebraic channel estimation method for FDD massive MIMO systems,” in IEEE Int. Workshop on Signal Process. Advances in Wireless Commun. (SPAWC) , Cannes, France, July 2019, accepted.
2[2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Commun. Mag. , vol. 52, no. 2, pp. 186–195, 2014.
3[3] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Topics Signal Process. , vol. 10, no. 3, pp. 436–453, 2016.
4[4] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Areas Commun. , vol. 8, no. 5, pp. 831–846, 2014.
5[5] C. Qian, X. Fu, N. D. Sidiropoulos, and Y. Yang, “Tensor-based channel estimation for dual-polarized massive MIMO systems,” IEEE Trans. Signal Process. , vol. 66, no. 24, pp. 6390–6403, Dec 2018.
6[6] A. Kammoun, H. Khanfir, Z. Altman, M. Debbah, and M. Kamoun, “Preliminary results on 3d channel modeling: From theory to standardization,” IEEE J. Sel. Areas Commun. , vol. 32, no. 6, pp. 1219–1229, 2014.
7[7] H. Xie, F. Gao, and S. Jin, “An overview of low-rank channel estimation for massive MIMO systems,” IEEE Access , vol. 4, pp. 7313–7321, 2016.
8[8] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proc. IEEE , vol. 98, no. 6, pp. 1058–1076, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Algebraic Channel Estimation Algorithms for FDD Massive MIMO systems

Abstract

Index Terms:

I Introduction

II Signal Model and Problem Statement

II-A Channel Model

II-B Problem Statement and challenges

III Training Sequence Design

III-A Tensor Preliminaries

Definition 1

Definition 2

Definition 3

Theorem 1

III-B Conjugate Flipped Structure

III-C Design of Sx\mathbf{S}_{x}Sx​ and Sy\mathbf{S}_{y}Sy​

Remark 1

IV Channel Estimation

IV-A Identification of Factor Matrices

IV-B DOA/DOD Estimation

IV-C Path-loss Estimation

IV-C1 Forward Way

IV-C2 Backward Way

IV-D Identifiability Analysis

Theorem 2

IV-E Complexity Analysis

Remark 2

V Exploiting full knowledge of S{\bf S}S

Corollary 1

V-A Special Case: Mr=1M_{r}=1Mr​=1

VI Simulations

VII Conclusion

III-C Design of $\mathbf{S}_{x}$ and $\mathbf{S}_{y}$

V Exploiting full knowledge of ${\bf S}$

V-A Special Case: $M_{r}=1$