Time-Varying Massive MIMO Channel Estimation: Capturing, Reconstruction   and Restoration

Muye Li; Shun Zhang; Nan Zhao; Weile Zhang; Xianbin Wang

arXiv:1905.02371·cs.IT·May 8, 2019

Time-Varying Massive MIMO Channel Estimation: Capturing, Reconstruction and Restoration

Muye Li, Shun Zhang, Nan Zhao, Weile Zhang, Xianbin Wang

PDF

Open Access

TL;DR

This paper introduces a novel DL channel tracking scheme for massive MIMO systems that leverages UL channel models, angle reciprocity, and Bayesian filtering to reduce overhead and improve accuracy in time-varying environments.

Contribution

The paper proposes a new DL channel tracking method using UL models, angle reciprocity, and Bayesian Kalman filtering, reducing the need for covariance acquisition overhead.

Findings

01

The proposed scheme effectively tracks DL channels with high accuracy.

02

Numerical results demonstrate improved performance over traditional methods.

03

The method reduces overhead in massive MIMO channel estimation.

Abstract

On the time-varying channel estimation, the traditional downlink (DL) channel restoration schemes usually require the reconstruction for the covariance of downlink process noise vector, which is dependent on DL channel covariance matrix (CCM). However, the acquisition of the CCM leads to unacceptable overhead in massive MIMO systems. To tackle this problem, in this paper, we propose a novel scheme for the DL channel tracking. First, with the help of virtual channel representation (VCR), we build a dynamic uplink (UL) massive MIMO channel model with the consideration of off-grid refinement. Then, a coordinate-wise maximization based expectation maximization (EM) algorithm is adopted for capturing the model parameters, including the spatial signatures, the time-correlation factors, the off-grid bias, the channel power, and the noise power. Thanks to the angle reciprocity, the spatial…

Tables1

Table 1. TABLE I: Simulation Parameters

Number of BS antennas $N_{t}$	128
Number of users per group $τ$	4
angle spread range	$[- 49^{\circ} - 43^{\circ}], [- 26^{\circ} - 20^{\circ}],$
	$[20^{\circ} 26^{\circ}], [43^{\circ} 49^{\circ}]$
Length of training sequences $L_{t}$	4
Channel coherence interval $L_{c}$	160
Symbol period	1 us
Carrier frequency	2 GHz
BS antenna space	$λ / 2$

Equations190

\mathbf{a}(\theta_{k,l,m})=\Big{[}1,e^{\jmath\frac{2\pi d}{\lambda}\sin(\theta_{k,l,m})},\ldots,e^{\jmath(N_{t}-1)\frac{2\pi d}{\lambda}\sin(\theta_{k,l,m})}\Big{]}^{T},

\mathbf{a}(\theta_{k,l,m})=\Big{[}1,e^{\jmath\frac{2\pi d}{\lambda}\sin(\theta_{k,l,m})},\ldots,e^{\jmath(N_{t}-1)\frac{2\pi d}{\lambda}\sin(\theta_{k,l,m})}\Big{]}^{T},

h_{k, m} = \int_{- \infty}^{+ \infty} l = 1 \sum L a (θ_{k, l, m}) e^{ 2 π ν m L_{c} T_{s}} ℏ_{k} (θ_{k, l, m}, ν) d ν,

h_{k, m} = \int_{- \infty}^{+ \infty} l = 1 \sum L a (θ_{k, l, m}) e^{ 2 π ν m L_{c} T_{s}} ℏ_{k} (θ_{k, l, m}, ν) d ν,

\tilde{h}_{k, m} = F_{N_{t}} h_{k, m},

\tilde{h}_{k, m} = F_{N_{t}} h_{k, m},

{\tilde{h}_{k, m} = r_{k, m} = diag (c_{k}) r_{k, m}, α_{k} r_{k, m - 1} + υ_{k, m},

{\tilde{h}_{k, m} = r_{k, m} = diag (c_{k}) r_{k, m}, α_{k} r_{k, m - 1} + υ_{k, m},

\displaystyle\mathcal{Q}_{k}\!=\!\left\{p\Big{|}\left\lfloor N_{t}\frac{d}{\lambda}\sin(\theta_{k}^{\min})\right\rfloor\leq\!p\!\leq\left\lfloor N_{t}\frac{d}{\lambda}\sin(\theta_{k}^{\max})\right\rfloor,p\in\mathbb{Z}\right\},

\displaystyle\mathcal{Q}_{k}\!=\!\left\{p\Big{|}\left\lfloor N_{t}\frac{d}{\lambda}\sin(\theta_{k}^{\min})\right\rfloor\leq\!p\!\leq\left\lfloor N_{t}\frac{d}{\lambda}\sin(\theta_{k}^{\max})\right\rfloor,p\in\mathbb{Z}\right\},

\displaystyle\mathcal{Q}_{k}\!=\!\left\{p\Big{|}p+\rho_{p}=N_{t}\frac{d}{\lambda}\sin(\theta_{k,l,m}),p\in\mathbb{Z}\right\},\rho_{k,l}\in[-0.5,0.5]

\displaystyle\mathcal{Q}_{k}\!=\!\left\{p\Big{|}p+\rho_{p}=N_{t}\frac{d}{\lambda}\sin(\theta_{k,l,m}),p\in\mathbb{Z}\right\},\rho_{k,l}\in[-0.5,0.5]

h_{k, m} = [A^{H} + B^{H} diag (ρ_{k})]_{:, Q_{k}} [\tilde{h}_{k, m}]_{Q_{k}} = [Φ (ρ_{k})]_{:, Q_{k}}^{H} [\tilde{h}_{k, m}]_{Q_{k}},

h_{k, m} = [A^{H} + B^{H} diag (ρ_{k})]_{:, Q_{k}} [\tilde{h}_{k, m}]_{Q_{k}} = [Φ (ρ_{k})]_{:, Q_{k}}^{H} [\tilde{h}_{k, m}]_{Q_{k}},

{h_{k, m} r_{k, m} = Φ (ρ_{k})^{H} diag (c_{k}) r_{k, m}, = α_{k} r_{k, m - 1} + υ_{k, m},

{h_{k, m} r_{k, m} = Φ (ρ_{k})^{H} diag (c_{k}) r_{k, m}, = α_{k} r_{k, m - 1} + υ_{k, m},

Y_{m} = k = 1 \sum τ h_{k, m} s_{k}^{T} + N_{m} = k = 1 \sum τ Φ (ρ_{k})^{H} diag (c_{k}) r_{k, m} s_{k}^{T} + N_{m},

Y_{m} = k = 1 \sum τ h_{k, m} s_{k}^{T} + N_{m} = k = 1 \sum τ Φ (ρ_{k})^{H} diag (c_{k}) r_{k, m} s_{k}^{T} + N_{m},

y_{m} = k = 1 \sum τ J_{k} (s_{k} \otimes Φ (ρ_{k})^{H}) diag (c_{k}) r_{k, m} + n_{m} = J r_{m} + n_{m},

y_{m} = k = 1 \sum τ J_{k} (s_{k} \otimes Φ (ρ_{k})^{H}) diag (c_{k}) r_{k, m} + n_{m} = J r_{m} + n_{m},

\hat{Ξ} =

\hat{Ξ} =

Q (α, \hat{Ξ}^{(l - 1)})

Q (α, \hat{Ξ}^{(l - 1)})

Q (Λ, \hat{Ξ}^{(l - 1)})

Q (c, \hat{Ξ}^{(l - 1)})

Q (ρ, \hat{Ξ}^{(l - 1)})

Q (σ_{n}^{2}, \hat{Ξ}^{(l - 1)})

\displaystyle\boldsymbol{\hat{\alpha}}^{(l)}=\arg\max_{{\boldsymbol{\alpha}}}Q\big{(}{\boldsymbol{\alpha}},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle\boldsymbol{\hat{\alpha}}^{(l)}=\arg\max_{{\boldsymbol{\alpha}}}Q\big{(}{\boldsymbol{\alpha}},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle\mathbf{\hat{\Lambda}}^{(l)}=\arg\max_{{\mathbf{\Lambda}}}Q\big{(}{\mathbf{\Lambda}},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle\mathbf{\hat{c}}^{(l)}=\arg\max_{{\mathbf{c}}}Q\big{(}{\mathbf{c}},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle\boldsymbol{\hat{\rho}}^{(l)}=\arg\max_{{\boldsymbol{\rho}}}Q\big{(}{\boldsymbol{\rho}},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle{\hat{\sigma_{n}^{2}}}^{(l)}=\arg\max_{{\boldsymbol{\rho}}}Q\big{(}\sigma_{n}^{2},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}.

\displaystyle Q\big{(}\!{\boldsymbol{\alpha}},\!\hat{\boldsymbol{\Xi}}^{(l-1)}\!\big{)}\!\!=\!\mathbb{E}_{\mathbf{r}|\mathbf{y};\hat{\boldsymbol{\Xi}}^{(l-1)}}\!\big{\{}\!\ln p(\mathbf{y}|\mathbf{r};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\!\setminus\!{\boldsymbol{\alpha}}^{(l-1)}\!)\!\big{\}}\!\!+\!\!\mathbb{E}_{\mathbf{r}|\mathbf{y};\hat{\boldsymbol{\Xi}}^{(l-1)}}\!\big{\{}\!\ln p(\mathbf{r};\!{\boldsymbol{\alpha}},\!{\boldsymbol{\Xi}}^{(l-1)}\!\setminus\!{\boldsymbol{\alpha}}^{(l-1)}\!)\!\big{\}}.

\displaystyle Q\big{(}\!{\boldsymbol{\alpha}},\!\hat{\boldsymbol{\Xi}}^{(l-1)}\!\big{)}\!\!=\!\mathbb{E}_{\mathbf{r}|\mathbf{y};\hat{\boldsymbol{\Xi}}^{(l-1)}}\!\big{\{}\!\ln p(\mathbf{y}|\mathbf{r};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\!\setminus\!{\boldsymbol{\alpha}}^{(l-1)}\!)\!\big{\}}\!\!+\!\!\mathbb{E}_{\mathbf{r}|\mathbf{y};\hat{\boldsymbol{\Xi}}^{(l-1)}}\!\big{\{}\!\ln p(\mathbf{r};\!{\boldsymbol{\alpha}},\!{\boldsymbol{\Xi}}^{(l-1)}\!\setminus\!{\boldsymbol{\alpha}}^{(l-1)}\!)\!\big{\}}.

\displaystyle p(\mathbf{y}_{m}|\mathbf{r}_{m};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\setminus{\boldsymbol{\alpha}}^{(l-1)})\sim\mathcal{CN}\left(\sum_{k=1}^{\tau}(\mathbf{J}_{k}\mathbf{r}_{k,m}\big{)},\sigma_{n}^{2}\mathbf{I}_{N_{t}L_{s}}\right).

\displaystyle p(\mathbf{y}_{m}|\mathbf{r}_{m};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\setminus{\boldsymbol{\alpha}}^{(l-1)})\sim\mathcal{CN}\left(\sum_{k=1}^{\tau}(\mathbf{J}_{k}\mathbf{r}_{k,m}\big{)},\sigma_{n}^{2}\mathbf{I}_{N_{t}L_{s}}\right).

ln p (r; α, Ξ^{(l - 1)} ∖ α^{(l - 1)})

ln p (r; α, Ξ^{(l - 1)} ∖ α^{(l - 1)})

\displaystyle p\Big{(}\mathbf{r}_{k,m}|\mathbf{r}_{k,m-1};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\!\!\setminus\!\!{\boldsymbol{\alpha}}^{(l-1)}\Big{)}=\frac{\exp\left(-(\mathbf{r}_{k,m}\!-\!\alpha_{k}\mathbf{r}_{k,m-1})^{H}(\boldsymbol{\hat{\Lambda}}_{k}^{(l-1)})^{-1}\!(\mathbf{r}_{k,m}\!-\!\alpha_{k}\mathbf{r}_{k,m-1})\right)}{{\pi}^{N}|\boldsymbol{\hat{\Lambda}}_{k}^{(l-1)}|}.

\displaystyle p\Big{(}\mathbf{r}_{k,m}|\mathbf{r}_{k,m-1};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\!\!\setminus\!\!{\boldsymbol{\alpha}}^{(l-1)}\Big{)}=\frac{\exp\left(-(\mathbf{r}_{k,m}\!-\!\alpha_{k}\mathbf{r}_{k,m-1})^{H}(\boldsymbol{\hat{\Lambda}}_{k}^{(l-1)})^{-1}\!(\mathbf{r}_{k,m}\!-\!\alpha_{k}\mathbf{r}_{k,m-1})\right)}{{\pi}^{N}|\boldsymbol{\hat{\Lambda}}_{k}^{(l-1)}|}.

Q (α_{k}, \hat{Ξ}^{(l - 1)}) =

Q (α_{k}, \hat{Ξ}^{(l - 1)}) =

Q (Λ_{k}, \hat{Ξ}^{(l - 1)}) =

Q (Λ_{k}, \hat{Ξ}^{(l - 1)}) =

\displaystyle+2{\widehat{\alpha}_{k}}^{{(l-1)}}\Re\Big{\{}\mathop{\mathrm{missing}}{tr}\big{(}\boldsymbol{\Lambda}_{k}^{-1}\boldsymbol{\Pi}_{k,m-1,m}^{(l-1)}\big{)}\Big{\}}\Big{)}+C_{2},

Q (c_{k}, \hat{Ξ}^{(l - 1)}) =

\displaystyle-\frac{\|\mathbf{s}_{k}\|^{2}}{\big{(}{\hat{\sigma}_{n}}^{{(l-1)}}\big{)}^{2}}\mathbf{c}_{k}^{T}\Big{(}\sum\limits_{m=1}^{M}\Big{(}[\boldsymbol{\Phi}^{H}(\boldsymbol{\hat{\rho}}_{k}^{(l-1)})\boldsymbol{\Phi}(\boldsymbol{\hat{\rho}}_{k}^{(l-1)})]\odot\boldsymbol{\Theta}_{k,m}^{(l-1)}\odot\mathbf{I}\Big{)}\Big{)}\mathbf{c}_{k}+C_{3},

Q (ρ_{k}, \hat{Ξ}^{(l - 1)}) =

Q (ρ_{k}, \hat{Ξ}^{(l - 1)}) =

\displaystyle-\frac{2\|\mathbf{s}_{k}\|^{2}}{\big{(}{\hat{\sigma}_{n}}^{{(l-1)}}\big{)}^{2}}\Big{(}\sum\limits_{m=1}^{M}(\mathbf{\hat{c}}_{k}^{(l-1)})^{T}\left(\Re\Big{\{}\mathbf{A}\mathbf{B}^{H}\odot\boldsymbol{\Theta}_{k,m}^{(l-1)}\Big{\}}\right)\odot\mathbf{I}\Big{)}\boldsymbol{\rho}_{k}

\displaystyle-\frac{\|\mathbf{s}_{k}\|^{2}}{\big{(}{\hat{\sigma}_{n}}^{{(l-1)}}\big{)}^{2}}\boldsymbol{\rho}_{k}^{T}\left(\sum\limits_{m=1}^{M}\left([\text{diag}(\mathbf{\hat{c}}_{k}^{(l-1)})\mathbf{B}\mathbf{B}^{H}\text{diag}(\mathbf{\hat{c}}_{k}^{(l-1)})]\odot\boldsymbol{\Theta}_{k,m}\right)\odot\mathbf{I}\right)\boldsymbol{\rho}_{k}+C_{4},

Q (σ_{n}^{2}, \hat{Ξ}^{(l - 1)}) =

\displaystyle+\frac{2}{\sigma_{n}^{2}}\sum\limits_{m=1}^{M}\Re\Big{\{}\mathbf{y}_{m}^{H}\sum\limits_{k=1}^{\tau}\mathbf{\widehat{J}}_{k}^{(l-1)}\mathbf{\widehat{r}}_{k,m}^{(l-1)}\Big{\}}-{N_{t}L_{s}}\sum_{m=1}^{M}{\ln{\pi\sigma_{n}^{2}}}-\frac{1}{\sigma_{n}^{2}}\sum_{m=1}^{M}{\mathbf{y}_{m}^{H}\mathbf{y}_{m}}+C_{5}.

r_{m}

r_{m}

y_{m}

X^{(l - 1)} = diag (\overset{α}{^}_{1}^{(l - 1)}, \overset{α}{^}_{2}^{(l - 1)}, \dots, \overset{α}{^}_{τ}^{(l - 1)}) \otimes I_{N_{t}},

X^{(l - 1)} = diag (\overset{α}{^}_{1}^{(l - 1)}, \overset{α}{^}_{2}^{(l - 1)}, \dots, \overset{α}{^}_{τ}^{(l - 1)}) \otimes I_{N_{t}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Millimeter-Wave Propagation and Modeling · Advanced Wireless Communication Techniques

Full text

Time-Varying Massive MIMO Channel Estimation: Capturing, Reconstruction and Restoration

Muye Li, Shun Zhang, Member, IEEE, Nan Zhao, Senior Member, IEEE,

Weile Zhang, Member, IEEE, Xianbin Wang, Fellow, IEEE M. Li, S. Zhang are with the State Key Laboratory of Integrated Services Networks, Xidian University, Xi an 710071, P. R. China (Email: [email protected]; [email protected]).N. Zhao is with the School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, P. R. China ([email protected]).W. Zhang is with the MOE Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an 710049, P. R. China (Email: [email protected]).X. Wang is with Department of Electrical and Computer Engineering, Western University, London, Ontario, Canada (Email: [email protected]).

Abstract

On the time-varying channel estimation, the traditional downlink (DL) channel restoration schemes usually require the reconstruction for the covariance of downlink process noise vector, which is dependent on DL channel covariance matrix (CCM). However, the acquisition of the CCM leads to unacceptable overhead in massive MIMO systems. To tackle this problem, in this paper, we propose a novel scheme for the DL channel tracking. First, with the help of virtual channel representation (VCR), we build a dynamic uplink (UL) massive MIMO channel model with the consideration of off-grid refinement. Then, a coordinate-wise maximization based expectation maximization (EM) algorithm is adopted for capturing the model parameters, including the spatial signatures, the time-correlation factors, the off-grid bias, the channel power, and the noise power. Thanks to the angle reciprocity, the spatial signatures, time-correlation factors and off-grid bias of the DL channel model can be reconstructed with the knowledge of UL ones. However, the other two kinds of model parameters are closely related with the carrier frequency, which cannot be perfectly inferred from the UL ones. Instead of relearning the DL model parameters with dedicated training, we resort to the optimal Bayesian Kalman filter (OBKF) method to accurately track the DL channel with the partially prior knowledge. At the same time, the model parameters will be gradually restored. Specially, the factor-graph and the Metropolis Hastings MCMC are utilized within the OBKF framework. Finally, numerical results are provided to demonstrate the efficiency of our proposed scheme.

Index Terms:

Massive MIMO, sparse Bayesian learning, time-varying channels, optimal Bayesian Kalman filter, factor graph

I Introduction

Due to its tremendous improvement in the spectral and energy efficiency [1], massive multiple-input multiple-output (MIMO) has become a potential technology for the 5th generation (5G) cellular networks to meet the future capacity requirement[2, 3, 4, 5]. In order to exploit the advantages of massive MIMO, perfect channel state information (CSI) is indispensable at the base station (BS). In the time-division duplex (TDD) systems, as there exists reciprocity between the uplink (UL) and downlink (DL) channel, the CSI at BS side can be obtained through UL training[6, 7]. However, in the frequency-division duplex (FDD) systems, the CSI at BS side should be obtained through the uplink training, downlink training and CSI feedback, which will cause unaffordable overhead together with the pilot contamination [8, 9, 10, 11].

Recently, to reduce the overhead of channel training and the CSI feedback, a set of new transmission strategies were proposed to reduce the dimensions of the effective channels, where low-rank property of the massive MIMO channel covariance matrix was fully exploited[16, 12, 13, 14, 15, 17]. In [12], a joint spatial division multiplexing (JSDM) scheme was proposed to project the eigenspace of channel covariance matrix of the desired user into the nullspace of the eigenspaces for all other users and to force the inter-user interference to zero. Nam et al. extended the results in [12], and designed a low-cost opportunistic user selection and prebeamforming algorithm to achieve the optimal sum-rate in [13] . In [14], Adhikary et al. improved the JSDM scheme to decrease the computational complexity. Sun et al. proposed a complete transmission scheme named beam division multiplex for FDD massive MIMO system under two-stage precoding framework in [15], where only the statistical CSI is used for the optimal downlink transmission. Xie et al. proposed a new channel estimation scheme for TDD/FDD massive MIMO system [16], where the UL/DL channel covariance matrices (CCM) were reconstructed. In this paper, the authors extracted the angle parameters and power angular spectrum (PAS) of channel from the instantaneous uplink CSI, reconstructed the UL CCM and used it to improve the UL channel estimation without any additional training cost, which does not need the long-time acquisition for uplink CCMs and can handle a more practical channel propagation environment with larger AS. All the above methods utilize the spatial information for the implementation of orthogonal transmission to different users. Theoretically, the spatial information can be derived from channel covariance matrix. Thus, the low-complex and effective achieving methods for the channel covariance matrices are significant to the above works [12, 13, 14, 15, 16, 17].

Nevertheless, it is quite difficult to acquire channel covariances in massive MIMO system [18], which is due to the singular value decomposition (SVD) of the high-dimensional matrix. To overcome the bottleneck, Xie et al. built a low-rank model for the instantaneous massive MIMO channel from antenna array theory [19] and proposed a spatial basis expansion model (SBEM) to offer an alternative for the channel acquisition without the channel covariances. Tang et al. proposed an off-grid channel estimation algorithm for the UL millimeter wave massive MIMO systems [20]. The authers exploited the physical structure of CSI and proposed an improved sparse Bayesian learning (ISBL) algorithm which can achieve high estimation accuracy. In [21], a new channel tracking method for massive MIMO systems was proposed under the time-varying circumstance. The extended KF method was used to blindly track the central angle, and the Taylor series expansion of the steering vectors was adopted to obtain the angular spread. Our previous work [22] proposed a channel estimation scheme for the time-varying massive MIMO networks. The authors developed a EM-based SBL framework to learn the temporal correlation factor, spatial signatures, and the channel powers, while Kalman filter (KF) and Rauch-Tung-Striebel smoother (RTSS) were adopted. Then they applied a reduced dimension KF for UL/DL virtual channel tracking. However, the channel powers are closely related with the carrier frequency, which can not be perfectly inferred from the UL ones. Moreover, considering the randomness of direction of arrivals (DOAs) of impinging signals, it is inevitable to cause performance loss by employing the existing channel estimation schemes [22] due to the power leakage caused by spatial sample mismatching. Finally, in [22], we did not incorporate the noise covariance into the model parameters.

Different from the aforementioned works, this paper focuses on the DL channel restoration for the time-varying massive MIMO networks, where both the TDD and FDD modes are considered. In order to exploit the low-rank property of the spatially correlated massive MIMO channel, we will directly learn the information of the UL channel model instead of requiring and analyzing the channel covariance matrices. First, an time-varying off-grid massive MIMO channel model with the adoption of Taylor series and the virtual channel representation (VCR) [23] is constructed, Then, a novel sparse Bayesian learning (SBL) framework is designed to estimate the spatial signatures, the off-grid bias and temporal varying characteristics of the sparse virtual channel model as well as the observation noise covariance. To avoid the unacceptable complexity, we apply a coordinate-wise maximization based expectation maximization (EM) algorithm to capture the parameters listed above. Next, according to the spatial signatures, we use a unified low dimensional Kalman filter (KF) for the virtual channel tracking. Thanks to the reciprocity of the UL DOAs and the DL DODs for the scattering rays, the DL spatial signatures, the time-correlation factors and the off-grid bias can be directly obtained from the UL one. But unfortunately, the other two kinds of model parameters can not be perfectly inferred from the UL ones, as they are closely related with the carrier frequency. Although we can still use the method used in the UL learning to capture the DL model parameters, this would inevitablely cause a tremendous scale of overheads. In order to avoid this obstacle, we resort to the optimal Bayesian Kalman filter (OBKF) method to accurately track the DL channel with the partially prior knowledge. We first show the recursive equations of the DL virtual channel restoration. Then, we employ an MCMC method to approximate some posterior effective statistics. Finally, to obtain the posterior distribution of the noise second-order statistics, i.e., the covariance matrix, a factor graph based sum-product algorithm is introduced.

The rest of this paper is organized as follows. Section II gives the system model and the description of virtual channel model. The main ideas of the Coordinate-wise Maximization based EM algorithm for model parameters learning and a concise depiction of UL channel tracking is illustrated in Section III. Section IV presented the DL model parameters recovering and the DL virtual channel tracking by factor graph based OBKF method. The simulation results are given in Section V, and Section VI shows the conclusions.

Notations: We use lowercase (uppercase) boldface to denote vector (matrix). $(\cdot)^{T}$ and $(\cdot)^{H}$ represent the transpose and the Hermitian transpose, respectively. $\mathbf{I}_{N}$ represents a $N\times N$ identity matrix. $\delta(\cdot)$ is the Dirac delta function. $\mathbb{E}\{\cdot\}$ is the expectation operator. Denote $\text{tr}\{\cdot\}$ and $|\cdot|$ as the trace and the determinant of a matrix, respectively. We use $[\mathbf{A}]_{i,j}$ and $\mathbf{A}_{:,\mathcal{Q}}$ (or $\mathbf{A}_{\mathcal{Q},:}$ ) to represent the $(i,j)$ -th entry of $\mathbf{A}$ and the submatrix of $\mathbf{A}$ which contains the columns (or rows) with the index set $\mathcal{Q}$ , respectively. $\mathbf{x}_{\mathcal{Q}}$ is the subvector of $\mathbf{x}$ formed by the entries with the index set $\mathcal{Q}$ . $\mathbf{v}\sim\mathcal{CN}(\mathbf{0},\mathbf{I}_{N})$ means that $\mathbf{v}$ satisfies the complex circularly-symmetric Gaussian distribution with zero mean and covariance $\mathbf{I}_{N}$ . $\lfloor p\rfloor$ denotes the largest integer no more than $p$ . ${\boldsymbol{\Xi}}^{(l-1)}\setminus{\boldsymbol{\alpha}}^{(l-1)}$ denotes the set ${\boldsymbol{\Xi}}^{(l-1)}$ expect the element ${\boldsymbol{\alpha}}^{(l-1)}$ The real component of $x$ is expressed as $\Re(x)$ . $\text{diag}(\mathbf{x})$ is a diagonal matrix whose diagonal elements are formed the elements of $\mathbf{x}$ , while $\text{blkdiag}(\mathbf{X}_{1},\mathbf{X}_{2},\dots)$ is a block diagonal matrix formed by $\mathbf{X}_{1},\mathbf{X}_{2},\dots$ .

II System Model and Channel Characteristics

We consider an uplink multiuser massive MIMO system, where the BS is equipped with $N_{t}\gg 1$ antennas in the form of uniform linear array (ULA), and $K$ single-antenna users are randomly distributed in its coverage area. We adopt a geometric channel model with $L$ scatters around the $k$ -th user, and each scatter is supposed to dedicate a single propagation path. Denote $\theta_{k,l,m}$ as a DOA of $k$ -th user, $l$ -th path and $m$ -th time block, and the BS antenna array spatial steering vector can be defined as:

[TABLE]

where $d\leq\lambda/2$ is antenna spacing of the BS; $\lambda$ is the signal carrier wavelength.

It is assumed that the direction of arrival (DOA) of each path is quasi-static during a block of $L_{c}$ channel uses and changes from block to block. The system sampling rate is $\frac{1}{T_{s}}$ . Then, the uplink channel $\mathbf{h}_{k,m}\in\mathbb{C}^{M\times 1}$ from user $k$ to the BS during the $m$ -th block can be expressed as [26, 27, 28]

[TABLE]

where $\hbar_{k}(\theta_{k,l,m},\nu)$ is the joint angle-Doppler channel gain function of the user $k$ corresponding to the direction of arrival (DOA) $\theta_{k,l,m}$ and the Doppler frequency $\nu$ . The channels from the BS to different users are assumed to be statistically independent.

As in [29], the VCR can be utilized to dig the sparsity of $\mathbf{h}_{k,m}$ as

[TABLE]

where $\mathbf{\tilde{h}}_{k,m}$ is the virtual channel of $\mathbf{h}_{k,m}$ , and $\mathbf{F}_{N_{t}}$ is the $N_{t}\times N_{t}$ normalized discrete Fourier transformation (DFT) matrix with $(p,q)$ th entry as $[\mathbf{F}_{N_{t}}]_{p,q}=\frac{1}{\sqrt{N}_{t}}e^{-\jmath\frac{2\pi pq}{N_{t}}}$ . Furthermore, we can adopt the simultaneously sparse signal model to depict the dynamics of $\mathbf{\tilde{h}}_{k,m}$ by adopting the first order auto regressive (AR) model [31] as

[TABLE]

where the time-varying processes $\mathbf{r}_{k,m}$ represents Gaussian Markov random processes, $\alpha_{k}$ is the transmission factor, $\boldsymbol{\upsilon}_{k,m}\sim\mathcal{CN}(0,\boldsymbol{\Lambda}_{k})$ is the process noise vector where $\boldsymbol{\Lambda}_{k}=\text{diag}([\lambda_{k,1}^{2},\lambda_{k,2}^{2},\cdots,\lambda_{k,N_{t}}^{2}])$ , and the spatial signature[32] vector $\mathbf{c}_{k}$ is determined by the set

[TABLE]

as $[\mathbf{c}_{k}]_{p}=1$ when $p\in\mathcal{Q}_{k}$ .

It can be checked that the locations of the non-zero elements of $\mathbf{c}_{k}$ depends on the angle spread (AS) information of the user $k$ , i.e., $[\theta_{k}^{\min},\theta_{k}^{\max}]$ . Theoretically, the AS information does not change drastically within thousands of the channel coherence time $L_{t}T_{s}$ , which means that $Q_{k}$ will remain time-invariant within a much longer period. Furthermore, under the massive MIMO scenario, especially at the millimeterwave and Tera Hertz band, the AS will be limited in one narrow region, and the number of the non-zero elements in $\mathbf{c}_{k}$ , i.e., $|Q_{k}|$ , will be much less than $N_{t}$ . Thus, the virtual channel $\mathbf{\tilde{h}}_{k,m}$ can be treated as suitably sparse signal.

Then we take the off-grid model into consideration. In fact, the DFT basis in (3) conducts a discrete spatial sample for the impinging signals with general fixed sampling grid, and discretely covers the entire spatial angle domain. However, in the real transition processes, the DOAs would not exactly impinging on those grids, and the direction mismatching happens. Under such circumstance, define the bias vector $\boldsymbol{\rho}_{k}$ , we introduce a bias-added DFT matrix, whose special index will be added with $\boldsymbol{\rho}_{k}$ , i.e. $p^{*}=p+[\boldsymbol{\rho}_{k}]_{p}$ . Correspondingly, the set $\mathcal{Q}_{k}$ can be redefined as:

[TABLE]

A simple example is illustrated in Figure 1. It intuitively explains the relationship between the spatial parameters and the virtual channel vector.

Before proceeding, we use $\mathbf{A}$ to represent $\mathbf{F}_{N_{t}}$ for simplicity, Inspired by the above observation, the channel vector $\mathbf{h}_{k,m}$ can be approximated with the Taylor series expansion as

[TABLE]

where $[\mathbf{B}^{H}]_{:,p}$ is obtained through taking derivative of $[\mathbf{A}^{H}]_{:,p}$ with respect to $p$ ; every element of $\boldsymbol{\rho}_{k}$ is the bias added on the corresponding predefined grid.

We can rewrite the AR model of the practical channel as

[TABLE]

where the definitions of $\mathbf{r}_{k,m}$ , $\alpha_{k}$ , and $\boldsymbol{\upsilon}_{k,m}$ are same with as in (4).

Notice that, in (8), $\mathbf{c}_{k}$ and $\boldsymbol{\rho}_{k}$ characterizes the spatial signatures and the AOA bias of the user $k$ , while both $\mathbf{\Lambda}_{k}$ and $\alpha_{k}$ depict the temporal varying characteristics of the virtual channel. After the construction of the AR model, the learning of the channel statistical characteristics is equivalent to capturing the model parameters $\boldsymbol{\rho}_{k}$ , $\alpha_{k}$ , $\mathbf{c}_{k}$ , $\mathbf{\Lambda}_{k}$ . Moreover, the characteristics of the AR model for one specific user changes so slowly that $\mathbf{\Xi}_{k}$ is constant during a large number of the consecutive channel coherence blocks.

III Model Parameters Capturing VIA Uplink Training and Uplink Channel Tracking

Without loss of generality, we assume that the current cell is allocated with $\tau\leq K$ orthogonal training sequences of length $L_{s}\leq L_{c}$ . Denote the orthogonal training set as $\mathbf{S}=[\mathbf{s}_{1},\mathbf{s}_{2},\ldots,\mathbf{s}_{\tau}]$ with $\mathbf{s}_{i}^{H}\mathbf{s}_{j}=L_{s}\sigma_{p}^{2}\delta(i-j)$ , where $\sigma_{p}^{2}$ is the pilot power. For the ease of illustration, we assume that $K=C\tau$ , where $C$ is an integer no less than $1$ .

Following most standards, there exists one long UL training period called preamble at the very beginning of each transmission. We can use the preamble to obtain the model parameters. Since we do not assume any prior spatial information, we have to divide $K$ users into $C$ groups, each containing $\tau$ users such that $\tau$ orthogonal training sequences are sufficient for each group.

For the illustration simplicity, we take the first group as an example, and use $M$ channel blocks to learn the channel model parameters $\boldsymbol{\Xi}_{k}$ . The received training signal during the $m$ -th block can be written as

[TABLE]

where $\mathbf{N}_{m}$ denotes the independent additive white Gaussian noise matrix with elements distributed as i.i.d. $\mathcal{CN}(0,\sigma_{n}^{2})$ ; $\sigma_{n}^{2}$ is assumed to be unknown. Moreover, we define the $N_{t}L_{s}\times 1$ vector $\mathbf{y}_{m}=\text{vec}(\mathbf{Y}_{m})$ and $N_{t}L_{s}\times 1$ vector $\mathbf{n}_{m}=\text{vec}(\mathbf{N}_{m})$ . Then (9) can be rearranged as

[TABLE]

where $\mathbf{n}_{m}\sim\mathcal{CN}(\mathbf{0},\sigma_{n}^{2}\mathbf{I}_{N_{t}L_{s}})$ , $\mathbf{J}=[\mathbf{J}_{1},\mathbf{J}_{2},\ldots,\mathbf{J}_{\tau}]\in\mathbb{C}^{N_{t}L_{s}\times N_{\tau}}$ , $\mathbf{r}_{m}=[\mathbf{r}_{1,m}^{T},\mathbf{r}_{2,m}^{T},\ldots,\mathbf{r}_{\tau,m}^{T}]\in\mathbb{C}^{N\tau\times 1}$ . For further use, we define the $N_{t}L_{s}M\times 1$ vector $\mathbf{y}=[\mathbf{y}_{1}^{T},\mathbf{y}_{2}^{T},\ldots,\mathbf{y}_{M}^{T}]$ , the $N\tau M\times 1$ vector $\mathbf{r}=[\mathbf{r}_{1}^{T},\mathbf{r}_{2}^{T},\ldots,\mathbf{r}_{\tau}^{T}]$ , the $N\tau\times 1$ vector $\boldsymbol{\rho}=[\boldsymbol{\rho}_{1}^{T},\boldsymbol{\rho}_{2}^{T},\ldots,\boldsymbol{\rho}_{\tau}^{T}]$ , $\tau\times 1$ vector $\boldsymbol{\alpha}=[\alpha_{1},\alpha_{2},\ldots,\alpha_{\tau}]^{T}$ , the $N\times 1$ vector $\mathbf{c}=[\mathbf{c}_{1}^{T},\mathbf{c}_{2}^{T},\ldots,\mathbf{c}_{\tau}^{T}]$ , and the $\tau N\times\tau N$ matrix $\boldsymbol{\Lambda}=\text{blkdiag}\{\boldsymbol{\Lambda}_{1},\boldsymbol{\Lambda}_{2},\ldots,\boldsymbol{\Lambda}_{\tau}\}$ .

Here, the task of the preamble is to capture the parameter set $\mathbf{\Xi}=\{\boldsymbol{\rho},\mathbf{c},\boldsymbol{\alpha},\boldsymbol{\Lambda},\sigma_{n}^{2}\}$ with the observation model (10) and the state equation (8).

III-A Problem Formulation

The objective of the learning is to estimate the best fitting parameters set $\boldsymbol{\Xi}$ with the given observation vector $\mathbf{y}$ . Theoretically, the ML estimator for $\boldsymbol{\Xi}$ can be formulated as

[TABLE]

Obviously, such estimator involves all possible combinations of the $\mathbf{r}$ and is not feasible to directly achieve the ML solution due to its high dimensional search. Nonetheless, one alternative method is to search the solution iteratively via the EM algorithm [33]. Furthermore, in order to achieve the faster convergence with a correspondingly lower complexity, we will adopt the Gauss-Seidel scheme and perform the coordinate-wise maximization based EM algorithm in the following.

III-B Coordinate-wise Maximization based EM to Accomplish Simultaneously Sparse Signal Learning

Similar to the classical EM algorithm, the coordinate-wise maximization based EM algorithm iteratively produces a sequence of ${\boldsymbol{\Xi}}^{(l)},l=1,2,\ldots$ , and each iteration is divided into two steps:

$\bullet$ ** Expectation step (E-step)**

[TABLE]

$\bullet$ ** Maximization step (M-step)**

[TABLE]

In the $l$ -th iteration, the E-step is to derive those objective functions as the expectation of the probability density function (PDF) $p(\mathbf{y},\mathbf{r};\hat{\boldsymbol{\Xi}})$ over the hidden variable $\mathbf{r}$ by setting $\boldsymbol{\Xi}$ as the estimated model parameters $\hat{\boldsymbol{\Xi}}^{(l-1)}$ in the previous iteration; the M-step is to find the new estimation ${\boldsymbol{\Xi}^{(l)}}$ by maximizing them. It has been proved that the sequence $\{\boldsymbol{\hat{\Xi}}^{(l)}\}$ converges to a stationary point of the likelihood function [34].

III-C * Expectation step*

In this subsection, we will carefully derive the three objective functions in (12)-(15). Now, we first examine $Q\left({\boldsymbol{\alpha}},\hat{\boldsymbol{\Xi}}^{(l-1)}\right)$ . Since the received samples $\mathbf{y}$ are known, the objective function $Q\left({\boldsymbol{\alpha}},\hat{\boldsymbol{\Xi}}^{(l-1)}\right)$ can be expressed as

[TABLE]

From (10), we get the conditional PDF as:

[TABLE]

Meanwhile, we have

[TABLE]

where the conditional PDF $p\Big{(}\mathbf{r}_{k,m}|\mathbf{r}_{k,m-1};{\boldsymbol{\alpha}},{\boldsymbol{\Xi}}^{(l-1)}\!\!\setminus\!\!{\boldsymbol{\alpha}}^{(l-1)}\Big{)}$ can be written as

[TABLE]

Before proceeding, we define three posterior statistics about $\mathbf{{r}}_{m}$ , i.e., $\mathbf{\hat{r}}_{k,m}^{(l-1)}\!\stackrel{{\scriptstyle\vartriangle}}{{=}}\!\mathbb{E}\Big{\{}\mathbf{r}_{k,m}|\mathbf{y},\boldsymbol{\hat{\Xi}}^{(l-1)}\Big{\}}$ , $\boldsymbol{\Theta}_{k,m}^{(l-1)}\stackrel{{\scriptstyle\vartriangle}}{{=}}\mathbb{E}\Big{\{}\mathbf{r}_{k,m}\mathbf{r}_{k,m}^{H}|\mathbf{y},\boldsymbol{\hat{\Xi}}^{(l-1)}\Big{\}}$ , and $\boldsymbol{\Pi}_{k,m-1,m}^{(l-1)}\stackrel{{\scriptstyle\vartriangle}}{{=}}\mathbb{E}\Big{\{}\mathbf{r}_{k,m-1}\mathbf{r}_{k,m}^{H}|\mathbf{y},\boldsymbol{\hat{\Xi}}^{(l-1)}\Big{\}}$ . Then, plugging (23)-(25) into (22) and taking some reorganizations, we can obtain:

[TABLE]

where $C_{1}$ is the sum of the items not related with $\alpha_{k}$ .

By doing similar process of (23)-(26), we can derive other objective functions as follows.

[TABLE]

where ${\mathbf{\widehat{J}}_{k}^{(l-1)}}\!=\!{(\mathbf{s}_{k}\!\otimes\!\mathbf{\Phi}\!(\boldsymbol{\widehat{\rho}}_{k}^{(l-1)})^{H})\text{diag}(\mathbf{\widehat{c}}_{k}^{(l-1)})}$ , $\boldsymbol{\Psi}(\boldsymbol{\rho}_{k},\mathbf{c}_{k})=\text{diag}(\mathbf{c}_{k})\boldsymbol{\Phi}(\boldsymbol{\rho}_{k})\boldsymbol{\Phi}(\boldsymbol{\rho}_{k})^{H}\text{diag}(\mathbf{c}_{k})$ , and $C_{2}$ , $C_{3}$ , $C_{4}$ , $C_{5}$ are not related to their own objective parameter.

From (26)-(30), it can be found that those expectation functions are dependent on $\mathbf{\widehat{r}}_{k,m}^{(l-1)}$ , $\boldsymbol{\Theta}_{k,m}^{(l-1)}$ , and $\boldsymbol{\Pi}_{k,m-1,m}^{(l-1)}$ . Similar to [22], with given $\mathbf{y}$ and $\boldsymbol{\hat{\Xi}}^{(l-1)}$ , the above three terms can be achieved from the following state-space model as

[TABLE]

where $\boldsymbol{\upsilon}_{m}=[\boldsymbol{\upsilon}_{1,m}^{T},\boldsymbol{\upsilon}_{2,m}^{T},\cdots,\boldsymbol{\upsilon}_{\tau,m}^{T}]^{T}\sim\mathcal{CN}(0,\widehat{\boldsymbol{\Lambda}}^{(l-1)})$ , $\mathbf{n}_{m}\sim\mathcal{CN}\left(0,{\hat{\sigma}}^{2^{(l-1)}}\otimes\mathbf{I}_{N_{t}\tau}\right)$

[TABLE]

III-D Maximization step

In this step, we will derive $\boldsymbol{\hat{\Xi}}^{(l)}$ through maximizing all the objective function of all users one by one. As shown in (26)-(30), ${\boldsymbol{\Xi}_{k}}$ of different users are uncoupled, which means that the parameters for each user’s dynamic virtual channel can be studied independently from user to user. Therefore, we will solve the maximal problem above one by one and solve them for each users independently.

III-D1 Searching $\boldsymbol{\hat{\mathbf{c}}}_{k}^{(l)}$

It can be checked that $[\mathbf{\Lambda}_{k}]_{j,j}$ is nearly 0 when $j\notin\mathcal{Q}_{k}$ . Based on this observation, we use a wise search algorithm to find a solution for $\boldsymbol{\hat{\mathbf{c}}}_{k}^{(l)}$ .

We can obtain that ${\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}$ only has a few continuous non-zero elements at its diagonal, while its other diagonal elements are nearly zero. Figure2 shows the sketch for diagonal elements of ${\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}$ . If we obtain the position of those non-zero point, we will find the optimal solution. An easy alternative method is to obtain the position of a big increment and the position of a big decrement through forward search. But, since there could be some unpredictable perturbations at those non-zero points, the above method may cause a level of bias.

Thus, we adopt a flattening way to avoid the influence, as shown in algorithm 1. First, we set all entries of $\mathbf{c}_{k}$ to zero. Denote $s1=[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j,j}+[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j+1,j+1}+[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j+2,j+2}$ and $s2=[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j+3,j+3}+[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j+4,j+4}+[{\widehat{\mathbf{\Lambda}}}_{k}^{(l-1)}]_{j+5,j+5}$ , then we compare the two value. Denote $[\mathbf{d}]_{j}=\ln(\frac{s2}{s1})$ as a logarithmic function for $\frac{s2}{s1}$ and track it. When it reaches the highest value, we set the current $j+3$ as the starting point $p_{st}$ . Continue tracking the value until it reaches its lowest value, and set the current $j+3$ as the ending point $p_{en}$ . Then set all the elements between $[\mathbf{c}_{k}]_{p_{st}}$ and $[\mathbf{c}_{k}]_{p_{en}}$ as $1$ . Figure2 shows the position searching part of algorithm 1.

III-D2 Computing $\hat{\boldsymbol{\rho}_{k}}^{(l)}$

Taking the derivatives of (29) with respect to $[\boldsymbol{\rho}_{k}]_{j}$ , we have

[TABLE]

Then, $[\hat{\boldsymbol{\rho}}_{k}^{(l)}]_{j}$ can be achieved by setting the derivatives to zero, i.e., $\frac{\partial Q\big{(}\boldsymbol{\rho}_{k},\hat{\boldsymbol{\Xi}}^{(l-1)}\big{)}}{\partial[\boldsymbol{\rho}_{k}]_{j}}=\mathbf{0}$ , and the rough solution $[\hat{\boldsymbol{\rho}}_{k}^{(l)}]_{j}^{*}$ can be computed as:

[TABLE]

With the constraint that $[{\boldsymbol{\rho}}_{k}]_{j}\in[-\frac{1}{2},\frac{1}{2}]$ , so if $[\hat{\boldsymbol{\rho}}_{k}^{(l)}]_{j}^{*}\geq\frac{1}{2}$ or $[\hat{\boldsymbol{\rho}}_{k}^{(l)}]_{j}^{*}\leq-\frac{1}{2}$ , the result of $\hat{\boldsymbol{\rho}}_{k}^{(l)}$ should be bounded as $\frac{1}{2}$ and $-\frac{1}{2}$ , respectively.

III-D3 Computing $\widehat{\alpha}_{k}^{(l)}$ , ${\widehat{\mathbf{\Lambda}}}_{k}^{(l)}$ ,and $(\widehat{\sigma}_{n}^{(l)})^{2}$

The computation of these three parameters is much easier than the above one. After some calculations, we can obtain $\widehat{\alpha}_{k}^{(l)}$ , ${\widehat{\mathbf{\Lambda}}}_{k}^{(l)}$ ,and $(\widehat{\sigma}_{n}^{{(l)}})^{2}$ as:

[TABLE]

III-E UL virtual channel tracking

Once the parameters of the virtual channel model $\boldsymbol{\Xi}_{k}=\{\alpha_{k},\boldsymbol{\Lambda}_{k},\mathbf{c}_{k},\boldsymbol{\rho}_{k},\sigma_{n}^{2}\}$ have been captured in the learning phase, the users can be divided into different groups according to their spatial signatures to remove the pilot contamination and realize the simultaneous training of different users with less orthogonal training sequences. Specifically, the users are allocated to the same group if their spatial signatures do not overlap i.e.,

[TABLE]

Assume that all users are divided into $G$ groups according to (41) and collect user indexes in the $g$ -th group into the set $\mathcal{G}_{g}$ . Since the users in the same group are separated by different spatial signatures, we can assign the same training sequences for the users in one group to estimate the virtual channel $\tilde{\mathbf{h}}_{k,m}$ . However, different user groups will utilize orthogonal training sequences. Therefore, we can construct a $G\times G$ matrix $\mathbf{S}_{G}$ with $\mathbf{S}_{G}^{H}\mathbf{S}_{G}=G\sigma_{p}^{2}\mathbf{I}_{G}$ . Then, $\mathbf{s}_{g}=[\mathbf{S}_{G}]_{:,g}$ will be allocated to the group $g$ , and all $K$ users send their training sequences simultaneously. Thus, the received signals at the BS can be expressed as

[TABLE]

Notice that each user in the same group have different spatial signatures. Since $\mathbf{s}_{g}$ is orthogonal to $\mathbf{s}_{g^{\prime}}$ , $g\neq g^{\prime}$ , the signals for the group $g$ can be extracted as

[TABLE]

where $\mathbf{D}_{\mathcal{Q}}=\left[[\mathbf{\Phi}(\boldsymbol{\rho}_{1})]_{:,\mathcal{Q}_{1}},[\mathbf{\Phi}(\boldsymbol{\rho}_{2})]_{:,\mathcal{Q}_{2}},\ldots\right]$ , $\mathbf{r}_{m_{\mathcal{Q}}}=\left[[\mathbf{r}_{1,m}]_{\mathcal{Q}_{1}}^{H},[\mathbf{r}_{2,m}]_{\mathcal{Q}_{2}}^{H},\ldots\right]^{H}$ , and $\tilde{\mathbf{n}}_{m}=\frac{1}{G\sigma_{p}^{2}}\mathbf{N}_{m}\mathbf{s}_{g}$ is the equivalent Gaussian white noise vector.

Define $\boldsymbol{\alpha}^{*}=\text{blkdiag}\{\text{diag}(\underbrace{\alpha_{1},\alpha_{1},\ldots}_{\mathcal{Q}_{1}}),\text{diag}(\underbrace{\alpha_{2},\alpha_{2},\ldots}_{\mathcal{Q}_{2}}),\ldots\}$ , $\mathbf{\Lambda}^{*}=\text{blkdiag}\{\mathbf{[}\Lambda_{1}]_{\mathcal{Q}_{1},\mathcal{Q}_{1}},[\Lambda_{2}]_{\mathcal{Q}_{2},\mathcal{Q}_{2}},\ldots\}$ , and then we can derive the following state-space model with reduced dimension to build to track $\mathbf{r}_{m_{\mathcal{Q}_{k}}}$ , by which we can obtain the estimation of $\mathbf{h}_{k,m}$ .

[TABLE]

It can be seen that the equations are composed of a state equation and a observation equation, we can introduce KF again to track the channel.

IV Downlink Channel Model Reconstruction and Channel Restoration

Similar to (2), the physical DL channel from the BS to the user $k$ during time block $m$ can be written as:

[TABLE]

where $\varphi$ is the direction of departure (DOD) the propagation path; $\mathbf{a}(\varphi)$ is the BS antenna array spatial steering vector defined in (1), but with different DL carrier wavelength ${\lambda}^{\prime}$ if FDD mode is selected. Similar to (7), the DL channel $\mathbf{g}_{k,m}$ can be also approximated by the sparse virtual channel model with spatial signatures $\mathcal{Q}_{k}^{\prime}$ , i.e.,

[TABLE]

IV-A DL channel model parameters reconstruction

In the FDD mode, since the channel covariance matrices between UL and DL have no reciprocity, the DL model parameters $\boldsymbol{\Xi}_{k}^{\prime}=\{\boldsymbol{\rho}_{k}^{\prime},\mathbf{c}_{k}^{\prime},\alpha_{k}^{\prime},\boldsymbol{\Lambda}_{k}^{\prime},\sigma_{n,k}^{2\prime}\}$ are different from the UL ones. Thanks to the angle reciprocity, we can reconstruct some parameters in $\boldsymbol{\Xi}_{k}^{\prime}$ . However, $\boldsymbol{\Lambda}_{k}^{\prime}$ , $\sigma_{n,k}^{2\prime}$ are closely related with the carrier frequency, and can not be perfectly inferred from the UL. In an easy way, an alternative method is to learn those parameters again in the DL training to obtain the model parameters, which will need some dedicated training and will waste the system bandwidth. Thus, we will resort to the Bayesian Kalman filtering to implement both the effective channel tracking and the restoration of the model parameters. We will see that this method does not dedicated training period, and will ensure the real-time channel updating. In the following, we will first introduce the reconstruction of $\boldsymbol{\rho}_{k}^{\prime}$ , $\mathbf{c}_{k}^{\prime}$ , $\alpha_{k}^{\prime}$ . Then, in the next subsection, the optimal Bayesian Kalman filtering will be given.

IV-A1 $\alpha_{k}^{\prime}$

For a specific user, the moving velocities along the UL and DL are the same. Thus, the Doppler frequency $\nu_{k}^{\max\prime}$ along the DL can be derived from the known parameters $\lambda$ , $\lambda^{\prime}$ and $\nu_{k}^{\max}$ as $\nu_{k}^{\max\prime}=\frac{\lambda^{\prime}}{\lambda}\nu_{k}^{\max}$ . Then, $\alpha_{k}^{\prime}$ is given by

[TABLE]

IV-A2 $\mathcal{Q}_{k}^{\prime}$ and $\boldsymbol{\rho}_{k}^{\prime}$

As there’s reciprocity lying in the propagation paths of the radiowaves, it can be found that only the DL signal waves that reverse the UL paths can reach the user in the DL transmission period [36, 37]. Hence, the DODs of DL scattering rays is the same as the DOAs of UL radiowaves at the BS. Therefore, we can recover $\mathcal{Q}_{k}^{\prime}$ as well as $\boldsymbol{\rho}_{k}^{\prime}$ from $\mathcal{Q}_{k}$ and $\boldsymbol{\rho}_{k}$ . Similar to (5), we have

[TABLE]

Then, it can be obtained that

[TABLE]

where

[TABLE]

and $\mathcal{Q}_{k}^{\prime}$ includes all the $p^{\prime}$ that satisfies (50).

Notice that different $p\in\mathcal{Q}$ may be mapped on a same grid in the DL virtual channel. If two rays in the UL are mapped on a same $p\prime$ with different bias $\rho^{\prime}$ , our scheme is to see them as one ray and adopt the average of their bias. For example, if the bias of two specific ray is $0.1$ and $0.3$ , respectively, we regard them as the very ray with the bias $0.2$ . Furthermore, the corresponding $c_{k}^{\prime}$ can be determined by $\mathcal{Q}_{k}^{\prime}$ , as $[\mathbf{c}_{k}^{\prime}]_{i}=1$ when $i\in\mathcal{Q}_{k}^{\prime}$ , .

IV-B DL channel restoration by optimal Bayesian Kalman filtering

Now, we start to track $[\tilde{\mathbf{g}}_{k,m}]_{\mathcal{Q}_{k}^{\prime}}$ with the reconstructed partial knowledge about $\boldsymbol{\Xi}_{k}^{\prime}$ in the previous subsection. Similar to (41), the $K$ users is divided into $G^{\prime}$ groups such that the DL spatial signatures of the users in the same group do not overlap, i.e.,

[TABLE]

Then, the user indices of the group $g$ are collected into the set $\mathcal{G}_{g}^{\prime}$ . In order to avoid the inter-group interference, the DL channels for each group are separately estimated. The training sequences can be reused by the users in the same group due to the separation of their spatial signatures. Thus, $|\mathcal{Q}^{\prime}_{k}|$ orthogonal training sequences are required to estimate $|\mathcal{Q}^{\prime}_{k}|$ coefficients for each user. So we build a $M_{g}\times M_{g}$ matrix $\mathbf{T}_{g}$ with $\mathbf{T}_{g}\mathbf{T}_{g}^{H}=M_{g}\sigma_{p}\mathbf{I}_{M_{g}}(M_{g}=\max\limits_{k\in\mathcal{G}^{\prime}_{g}}|\mathcal{Q}^{\prime}_{k}|)$ and select $|\mathcal{Q}^{\prime}_{k}|$ rows of $\mathbf{T}_{g}$ as the training sequences for user $k$ , i.e. $\mathbf{S}_{k}=[\mathbf{T}_{g}]_{1:|{{\mathcal{Q}}_{k}^{\prime}}|,:}$ . Then, $\mathbf{S}_{k}$ is transmitted on the beam $[\mathbf{\Phi}(\boldsymbol{\rho}_{k}^{\prime})^{H}]_{:,\mathcal{Q}_{k}^{\prime}}$ . Since the BS simultaneously transmits training sequences for users in the same group, the transmitted signals during DL channel estimation for group $g$ is given by $\mathbf{\Gamma}_{g}=\sum_{k\in\mathcal{G}_{g}^{\prime}}[\mathbf{\Phi}(\boldsymbol{\rho}_{k}^{\prime})^{H}]_{:,\mathcal{Q}_{k}^{\prime}}\mathbf{S}_{k}$ .

As a result, the received signal at the user $k$ of the group $g$ can be expressed as

[TABLE]

To eliminate the inter-group interference, we can further derive that

[TABLE]

where the equivalent Gaussian white noise vector $\tilde{\mathbf{n}}_{k,m}^{\prime}=\frac{1}{M_{g}\sigma_{p}^{2}}\mathbf{S}_{k}\mathbf{n}_{k,m}^{\prime}\sim\mathcal{CN}(\mathbf{0},\frac{\sigma_{n}^{2\prime}}{\sigma_{p}^{2}}\mathbf{I}_{|\mathcal{Q}_{k}^{\prime}|})$ . Here the covariance of original noise $\sigma_{n}^{2\prime}$ is unknown.

Then we can obtain the following state-space model as

[TABLE]

where $[{\mathbf{v}}_{k,m}^{\prime}]_{\mathcal{Q}_{k}^{\prime}}\sim\mathcal{CN}(0,[\mathbf{\Lambda}_{k}^{\prime}]_{\mathcal{Q}_{k}^{\prime}})$ . As mentioned in the previous subsection, we can reconstruct partial knowledge about the model parameters in (55). However, the statistics of the noise in both the observation and the state equations are unknown. Thus, it is unable to track the DL channel by using the classical KF method, whose performance is very sensitive to the accuracy of noise statistics. Nonetheless, there are many robust KF methods to handle this problem, such as IBF KF in[24]. In order to fully utilize the additional information in the observed signal, the optimal Bayesian Kalman filter (OBKF) method will be adopted for our DL channel tracking process. The process is divided into 3 parts: the OBKF process, the sum-product algorithm for posterior noise statistics, and the MCMC computation.

IV-B1 OBKF for DL channel tracking

For one specific user, we denote $\boldsymbol{\vartheta}=\left\{\sigma_{n}^{2\prime},[\mathbf{\Lambda}^{\prime}]_{j,j},j\in\mathcal{Q}^{\prime}\right\}$ as the set of all the unknown parameters in both the process noise and the observation noise vectors, and use the superscript $\boldsymbol{\vartheta}$ to express that the unknown parameters relate partly or wholly with it. Then, the state-space model (55) can be reexpressed as

[TABLE]

Since each user can track the simultaneously channels and restore the model parameters independently, we will ignore the subscript $k$ in the following for simplicity.

Thus, under the OBKF framework, the following equations can be utilized to effectively track the DL virtual channel $[\tilde{\mathbf{g}}_{k,m}]_{\mathcal{Q}_{k}^{\prime}}$ as

[TABLE]

where $\tilde{\mathbf{y}}^{\prime}(m)=\left[\tilde{\mathbf{y}}_{1}^{\prime H},\tilde{\mathbf{y}}_{2}^{\prime H},\ldots,\tilde{\mathbf{y}}_{m}^{\prime H}\right]^{H}$ , and $\mathbf{P}_{m}^{\boldsymbol{\vartheta}}=\mathbb{E}\left\{([{\tilde{\mathbf{g}}}_{m}]_{\mathcal{Q}^{\prime}}^{\boldsymbol{\vartheta}}-[\widehat{\tilde{\mathbf{g}}}_{m}]_{\mathcal{Q}^{\prime}}^{\boldsymbol{\vartheta}})([{\tilde{\mathbf{g}}}_{m}]_{\mathcal{Q}^{\prime}}^{\boldsymbol{\vartheta}}-[\widehat{\tilde{\mathbf{g}}}_{m}]_{\mathcal{Q}^{\prime}}^{\boldsymbol{\vartheta}})^{H}\right\}$ is the covariance matrix of the channel estimation error relative to ${\boldsymbol{\vartheta}}$ at time $m$ .

To decrease the computation complexity, we make the approximation $\mathbb{E}_{\boldsymbol{\vartheta}}\!\!\left\{\!\mathbf{P}_{m}^{\boldsymbol{\vartheta}}\!|\tilde{\mathbf{y}}^{\prime}\!(\!m\!)\!\right\}\!\!\approx\!\!\mathbb{E}_{\boldsymbol{\vartheta}}\!\!\left\{\!\mathbf{P}_{m}^{\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}\!(\!m\!-\!1\!)\!\right\}$ for simplicity, and replace $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{\mathbf{P}_{m}^{\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ in (60) with $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{\mathbf{P}_{m}^{\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m-1)\right\}$ from the previous iteration [25]. This option is computationally more efficient, which is due to the fact that we do not need to repeat all the recursions in (57)–(60) at each time block $m$ .

From (57), (58), (59), (60), we will find that two conditional expectations $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{\frac{\sigma_{n}^{2\prime}}{\sigma_{p}^{2}}\mathbf{I}_{|\mathcal{Q}^{\prime}|}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ and $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{[\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ should be evaluated with respect to the posterior distribution $p(\!{\boldsymbol{\vartheta}}\!|\tilde{\mathbf{y}}^{\prime}\!(\!m\!))\!\propto\!p(\tilde{\mathbf{y}}^{\prime}\!(\!m\!)\!|{\boldsymbol{\vartheta}})p(\!{\boldsymbol{\vartheta}}\!)$ , where $p(\tilde{\mathbf{y}}^{\prime}(m)|{\boldsymbol{\vartheta}})$ is the likelihood function of ${\boldsymbol{\vartheta}}$ given the observation sequence $\tilde{\mathbf{y}}^{\prime}(m)$ . Since there may be no closed-form solution for $p({\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m))$ for many prior distributions, to implement the OBKF process, we employ the MCMC method to generate samples from the posterior distribution $p({\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m))$ and to approximate $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{\frac{\sigma_{n}^{2\prime}}{\sigma_{p}^{2}}\mathbf{I}_{|\mathcal{Q}^{\prime}|}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ and $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{[\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ as sample means of the generated MCMC samples. With the Bayes rule, it can be checked that the likelihood function $p(\tilde{\mathbf{y}}^{\prime}(m)|{\boldsymbol{\vartheta}})$ should be calculated to determine $p({\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m))$ .

With (56) and the property of the the Markov model, we can obtain

[TABLE]

where $\mathbf{x}^{\prime}(m)=[[\tilde{\mathbf{g}}_{1}]_{\mathcal{Q}^{\prime}}^{H},[\tilde{\mathbf{g}}_{2}]_{\mathcal{Q}^{\prime}}^{H},\ldots,[\tilde{\mathbf{g}}_{m}]_{\mathcal{Q}^{\prime}}^{H}]^{H}$ is the set of the past $m$ $[\tilde{\mathbf{g}}_{m}]_{\mathcal{Q}^{\prime}}$ .

With (61) and (62),the marginalization of $p(\tilde{\mathbf{y}}^{\prime}(m),\mathbf{x}^{\prime}(m)|{\boldsymbol{\vartheta}})$ can be factorized as

[TABLE]

Then, $p(\tilde{\mathbf{y}}^{\prime}(m)|{\boldsymbol{\vartheta}})$ can be denoted with a factor graph, as shown in Figure 3, where the factors in (IV-B1) are represented by “functions nodes ”marked blue and red boxes and the corresponding random variables are represented by “variable nodes”marked as green circles. One specific variable node $\boldsymbol{x}$ connects with the function nodes $f$ , whose augments contain $\boldsymbol{x}$ . Furthermore, we will resort to the belief propagation (BP), also known as sum-product message passing, to implement the message-passing in our constructed factor graph Figure 3. BP passes real valued messages along the edges between nodes in the factor graph. Specifically, for the function node $f$ and the variable node $x$ , the messages from $f$ to $\boldsymbol{x}$ and from $\boldsymbol{x}$ to $f$ are separately defined as $\Omega_{f\rightarrow\boldsymbol{x}}(x)$ and $\Omega_{\boldsymbol{x}\rightarrow f}(\boldsymbol{x})$ , whose augment is $\boldsymbol{x}$ . With the BP theory, we can obtain

[TABLE]

where the set $\mathcal{N}(\boldsymbol{x})$ collects all the neighbouring nodes of the given node $\boldsymbol{x}$ in one factor graph, and $\sim\boldsymbol{x}$ possesses the same meaning with the same notation [38].

IV-B2 Sum-Product Algorithm for posterior noise statistics

A node in the factor graph operates when it receives all messages from its neighbouring nodes. The first step to run a factor graph is that each leaf function node sends the message to its neighbouring nodes. For expression simplicity, we define the factor nodes and variable nodes in Figure 3 as

[TABLE]

and $f_{A,1}=p(\mathbf{w}_{1})=\mathcal{CN}\left(\mathbf{w}_{1};\mathbf{0},[\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}\right)$ .

It can be seen from Figure 3 that there’s three kind of message in the factor graph, i.e., $\Omega_{f_{A,i}\to\mathbf{w}_{i}}$ , $\Omega_{f_{B,i}\to\mathbf{w}_{i}}$ , and $\Omega_{\mathbf{w}_{i}\to f_{A,i}}$ . Since, we only need to consider the forward passing message, the expression of $\Omega_{\mathbf{w}_{i}\to f_{A,i}}$ can be omitted here. With (61), (62), and (64), it can be readily checked from Figure 3 that

[TABLE]

With respect to the term $\Omega_{f_{A,i}\to\mathbf{w}_{i}}$ , we have the following lemma.

Lemma 1

For all $1\leq i\leq m-1$ , the message $\Omega_{f_{A,i+1}\to\mathbf{w}_{i+1}}$ in Figure 3 can be expressed as:

[TABLE]

where

[TABLE]

where $\mathbf{\Gamma}_{i}$ and $\bm{\nu}_{i}$ are defined in the following proof part. Furthermore, in every step $i$ , the parameters $\omega_{i+1}$ , $\boldsymbol{\mu}_{i+1}$ , and $\mathbf{\Sigma}_{i+1}$ in $\Omega_{f_{A,i+1}\to\mathbf{w}_{i+1}}$ are related and only related to those in $\Omega_{f_{A,i}\to\mathbf{w}_{i}}$ . In addition, it is checked that $\Omega_{f_{A,1}\to\mathbf{w}_{1}}=\omega_{1}\mathcal{CN}\left(\mathbf{w}_{1};\mathbf{\mu}_{1},\mathbf{\Sigma}_{1}\right)$ .

Proof*:

Before proceeding, we give the following property:*

[TABLE]

With the above equation, if $\Omega_{f_{A,i}\to\mathbf{w}_{i}}=\omega_{i}\mathcal{CN}\left(\mathbf{w}_{i};\boldsymbol{\mu}_{i},\boldsymbol{\Sigma}_{i}\right)$ holds for $i\geq 1$ , we can derive

[TABLE]

where

[TABLE]

and (88) in the Appendix are utilized in the above derivations.

Furthermore, with respect to the term $\frac{\mathcal{CN}\left(0;\alpha^{\prime-1}\mathbf{w}_{i+1},[\alpha^{\prime-2}\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}\right)}{\mathcal{CN}\left(\mathbf{0};\bm{\Gamma}_{i}\left(\alpha^{\prime}[\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}^{-1}\mathbf{w}_{i+1}+\bm{\nu}_{i}\right),\bm{\Gamma}_{i}\right)}$ in (72), we can obtain

[TABLE]

where

[TABLE]

Notice that the equations (88) and (VII) in the Appendix, and the following properties are utilized in the above derivations.

[TABLE]

So, $\Omega_{f_{A,i+1}\to\mathbf{w}_{i+1}}$ can be reexpressed from (72) and (1) as:

[TABLE]

*where *

[TABLE]

With Lemma 1, we will finally obtain the message $\Omega_{f_{A,m}\to\mathbf{w}_{m}}=\omega_{m}\mathcal{CN}(\mathbf{w}_{m};\boldsymbol{\mu}_{m},\mathbf{\Sigma}_{m})$ . Then the equation (IV-B1) can be rewritten as:

[TABLE]

where

[TABLE]

Hence, using the adopted sum-product and factor graph algorithm, the likelihood function $p(\tilde{\mathbf{y}}^{\prime}(m)|{\boldsymbol{\vartheta}})$ can be obtained according to (IV-B2), where all the parameters defined before can be obtained according to the above recursion processes.

IV-B3 MCMC computation

As the two posterior effective noise statistics $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{\frac{\sigma_{n}^{2\prime}}{\sigma_{p}^{2}}\mathbf{I}_{|\mathcal{Q}^{\prime}|}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ and $\mathbb{E}_{\boldsymbol{\vartheta}}\left\{[\mathbf{\Lambda}^{\prime}]_{\mathcal{Q}^{\prime}}|\tilde{\mathbf{y}}^{\prime}(m)\right\}$ are unknown, we employ the Metropolis Hastings MCMC algorithm to estimate them. This algorithm is used to the case where the proposal distribution is no longer a symmetric function of its arguments [39]. At the $j$ -th iteration, the last accepted MCMC sample in the sequence of samples be ${\boldsymbol{\vartheta}}^{(j)}$ generated. A candidate MCMC sample $\tilde{\boldsymbol{\vartheta}}$ will be generated according to a proposed distribution $p(\tilde{\boldsymbol{\vartheta}}|{\boldsymbol{\vartheta}}^{(j)})$ . As the specific choice of proposal distribution can have a prominent effect on the performance of the algorithm, we choose a Gaussian distribution centred on the current state $\boldsymbol{\vartheta}^{(j)}$ . The candidate MCMC sample $\tilde{\boldsymbol{\vartheta}}$ will be either accepted or rejected according to an acceptance ratio $r$ defined as

[TABLE]

where the second formula is used when the proposal distribution is symmetric, i.e., $p(\tilde{\boldsymbol{\vartheta}}|{\boldsymbol{\vartheta}}^{(j)})=p({\boldsymbol{\vartheta}}^{(j)}|\tilde{\boldsymbol{\vartheta}})$ . The $(j+1)$ -th MCMC sample is

[TABLE]

We can iterate the process in (IV-B3), (84), and achieve a sequence of MCMC samples. The positivity of the proposal distribution $p(\tilde{\boldsymbol{\vartheta}}|{\boldsymbol{\vartheta}}^{(j)})$ for any ${\boldsymbol{\vartheta}}^{(j)}$ is a sufficient condition for an ergodic Markov chain of MCMC samples, whose steady-state distribution is the target distribution $p({\boldsymbol{\vartheta}}|\tilde{\mathbf{y}}^{\prime}(m))$ . After generating enough MCMC samples, the posterior effective noise statistics can be approximated by computing the sample mean of the accepted MCMC samples.

The steps of the whole procedure for the DL channel reconstruction and restoration are summarized in Algorithm 2. And In order to describe the relationship among different parts of our proposed scheme intuitively, the overall algorithm block diagram of the proposed scheme are illustrated in Figure 4.

V Simulations Results

In this section, we will evaluate the performance of our proposed tracking scheme through numerical simulation. We consider a massive MIMO network where the BS is equipped with $N_{t}=128$ antennas. $K=32$ is the number of users, while they are divided into $8$ groups. We take the first group as an example to show the perfect performance. The simulation parameters are summarized in TABLE 1.

Correspondingly, the preamble is also divided into $8$ segments. Only $4$ users in the same group are active tt each segment, so there is no inter-group interference. As a result, the training of length 4 is sufficient. During the virtual channel tracking stage, the $K$ users are regrouped such that the spatial signatures of the users in the same group do not overlap. The signal-to-noise ratio (SNR) is defined as SNR $=\sigma_{p}^{2}/\sigma_{n}^{2}$ . The performance metrics are taken as the average MSEs of the model parameters $\alpha$ , $\mathbf{c}$ and $\mathbf{\Lambda}$ and that of the virtual channel $\tilde{\mathbf{g}}$ and $\tilde{\mathbf{h}}$ i.e.,

[TABLE]

We first investigate the convergence of the UL EM process. Figure 5 shows the MSEs curves versus the number of iteration. $M_{u}=15$ channel blocks are used to learn model parameters. We can see from Figure 5 that after 5 iterations, all the parameters have arrived at their steady states, which shows that the algorithm has a fast convergence speed.

Figure 6 presents the MSE performance of the model parameters learning as a function of SNR, with EM algorithm running 5 iterations for each SNR case. With the increase of the SNR, we can see that the MSE curves of all parameters decrease almost linearly. Moreover, we show the performance of the estimation for the off-grid bias and spatial signature performance in Figure 6, with SNR = 20dB. They are also estimated very accurately.

After UL learning of all parameters and DL reconstruction of partial parameters, the next step is to track the DL channel by adopt OBKF, with the known parameters, meanwhile restore the unreconstructed parameters for later tracking. To decrease the computation complexity, we will adopt OBKF for a limited number of time-blocks, and then use classical KF to continue tracking the channel.

First, we studies the MSE of the two unknown DL channel model parameters at the last OBKF time-block versus the number of time-block using OBKF, with different SNR, and the MSE versus SNR with different number of OBKF time-block. In Figure 7, we can see that the MSE of the two unknown DL channel model parameters decreases in each SNR case, and almost arrive at their convergence point when OBKF time-block $M_{d}=15$ . As SNR goes higher, the convergence point can be arrived when OBKF time-block $M_{d}=10$ . We can also see that the curves linearly decrease with the increase of the SNR, in Figure 8.

We can find that the performance are better when SNR is higher. we can explain the above phenomenon that OBKF is not only restoring the virtual channel, but also restoring the unknown parameters. And after a scale of restoring time, the parameters will be very closed to the true one, so the estimated virtual channel will have a good performance.

Then we studied the MSE of virtual channel for each time-block, including both the OBKF time-blocks and the later classical KF time-blocks, with different SNR, as shown in Figure 9. The figure also shows the performance of classical KF with perfect parameters as well as classical KF with weak parameters. We set the number of OBKF time-block $M_{d}=10$ , with which we can obtain almost the best performance. From Figure 9 we can obtain that the MSE of tracked virtual channel decreases when OBKF runs. We can see that the performance to be steady and is very close to the performance of classical KF with perfect parameters at $m_{d}=6$ , which shows the accuracy of our method.

To further illustrate the performance of the method, Figure 10 shows the relationship between MSE of virtual channel and SNR, together with classical KF of the above two situations. from Figure 10 we can see that the performance of KF with weak parameters is far away from the precisely one, while our OBKF method has a wonderful performance. Moreover, with the SNR increasing, the gap between our method and perfect KF decreases very fast. At SNR = 30, for example, the two performance is very nearly equal. Notice that the gap between our method and weak KF is also decreasing. This can be explained as follows. At low SNR, our method obtains a huge gain by utilizing the correlation of time-varying channel. But with SNR increasing, the performance is mostly decided on SNR, meanwhile the effect of correlation is diminishing.

Furthermore, we show the MSE performance of the two unknown DL channel model parameters versus SNR for different velocity at the last OBKF time-block, while $M_{d}=15$ . In Figure 11, we can find that the performance is better at slower velocity, while at higher velocity the performance is only a little worse and is acceptable.

VI Conclusion

In this paper, we proposed a skillful scheme for the DL channel tracking. First, with the help of VCR, a dynamic uplink (UL) massive MIMO channel model was built with the consideration of off-grid refinement. Then, a coordinate-wise maximization based expectation maximization (EM) algorithm was adopted in the model parameters learning period. Thanks to the angle reciprocity, with the knowledge of UL channel model parameters, we recovered some of the parameters of DL channel model. After that, as there remains some parameters which could not be perfectly inferred from the UL ones, we resorted to OBKF method to accurately track the DL channel. During the method, factor-graph and Metropolis Hastings MCMC were applied to track the expectation of posterior statistics. Numerical results showed that our proposed scheme has not only a strong convergence, but also a very low estimation MSE.

VII Appendix

The product of the $N$ -dimensional complex Gaussian PDF

For the $N$ -dimensional complex Gaussian distribution $p(\mathbf{x})=\mathcal{CN}\left({\boldsymbol{x};\boldsymbol{\mu},\boldsymbol{\Sigma}}\right)$ , we can obtain its canonical notation as

[TABLE]

Then, for the PDFs $p_{i}(\mathbf{x})=\mathcal{CN}\left({\boldsymbol{x};\boldsymbol{\mu}_{i},\boldsymbol{\Sigma}_{i}}\right)$ , $i=1,2,\ldots,L$ , we can derive

[TABLE]

where the term $\zeta_{i}=-N\ln\pi-\ln|\boldsymbol{\Sigma}_{i}|-\boldsymbol{\mu}_{i}^{H}\boldsymbol{\Sigma}^{-1}_{i}\boldsymbol{\mu}_{i}$ is defined in the above equation. Before proceeding, let us define $\boldsymbol{\bar{\Sigma}}_{L}=\left(\sum_{i=1}^{L}\boldsymbol{\Sigma}_{i}^{-1}\right)^{-1}$ , and $\boldsymbol{\bar{\mu}}_{L}=\boldsymbol{\bar{\Sigma}}_{L}\left(\sum\limits_{i=1}^{L}\boldsymbol{\Sigma}^{-1}_{i}\boldsymbol{\mu}_{i}\right)$ . Hence, the above equation can be reexpressed as

[TABLE]

where $\bar{\zeta}_{L}=-N\ln\pi-\ln|\boldsymbol{\bar{\Sigma}}_{L}|-\boldsymbol{\bar{\mu}}_{L}^{H}\boldsymbol{\bar{\Sigma}}_{L}^{-1}\boldsymbol{\bar{\mu}}_{L}$ .

Specially, for $L=2$ , it can be obtained that

[TABLE]

Moreover, if the terms $\bm{\mu}_{1}$ , $\bm{\Sigma}_{1}$ , $\bm{\bar{\mu}}_{2}$ , and $\bm{\bar{\Sigma}}_{2}$ are given, we can derive the quotient of two N-dimensional complex Gaussian PDF

[TABLE]

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. L. Marzetta, “Noncooperative Cellular Wireless with Unlimited Numbers of Base Station Antennas,” IEEE Trans. Wireless Commun. , vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
2[2] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process. Mag. , vol. 30, no. 1, pp. 40–60, Jan. 2013.
3[3] V. Jungnickel, K. Manolakis, W. Zirwas, B. Panzner, V. Braun, M. Lossow, M. Sternad, R. Apelfrojd, and T. Svensson, “The role of small cells, coordinated multipoint, and massive MIMO in 5G,” IEEE Commun. Mag. , vol. 52, no. 5, pp. 44–51, May 2014.
4[4] X. Liu and Y. Liu and X. Wang and H. Lin, “Highly Efficient 3-D Resource Allocation Techniques in 5G for NOMA-Enabled Massive MIMO and Relaying Systems,” IEEE Journal on Selected Areas in Commun. , vol. 35, no. 12, pp. 2785–2797, Dec. 2017.
5[5] W. Tan, M. Matthaiou, S. Jin, and X. Li, “Spectral efficiency of DFT-based processing hybrid architectures in massive MIMO” IEEE Wireless Commun. Letters , vol. 6, no. 5, pp. 586-589 Oct. 2017.
6[6] N. Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE Trans. Inf. Theory , vol. 52, no. 11, pp. 5045–5060, Nov. 2006.
7[7] X. Yang, W.-J. Lu, N. Wang, K. Nieman, S. Jin, H. Zhu, X. Mu, I. Wong, Y. Huang, and X. You, “Design and implementation of a TDD-based 128-antenna massive MIMO prototyping system,” ar Xiv preprint ar Xiv:1608.07362 , 2016.
8[8] S. Noh, M. D. Zoltowski, and D. J. Love, “Training sequence design for feedback assisted hybrid beamforming in massive MIMO systems,” IEEE Trans. Commun. , vol. 64, no. 1, pp. 187–200, Jan. 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Time-Varying Massive MIMO Channel Estimation: Capturing, Reconstruction and Restoration

Abstract

Index Terms:

I Introduction

II System Model and Channel Characteristics

III Model Parameters Capturing VIA Uplink Training and Uplink Channel Tracking

III-A Problem Formulation

III-B Coordinate-wise Maximization based EM to Accomplish Simultaneously Sparse Signal Learning

III-C * Expectation step*

III-D Maximization step

III-D1 Searching c^k(l)\boldsymbol{\hat{\mathbf{c}}}_{k}^{(l)}c^k(l)​

III-D2 Computing ρk^(l)\hat{\boldsymbol{\rho}_{k}}^{(l)}ρk​^​(l)

III-D3 Computing α^k(l)\widehat{\alpha}_{k}^{(l)}αk(l)​, Λ^k(l){\widehat{\mathbf{\Lambda}}}_{k}^{(l)}Λk(l)​,and (σ^n(l))2(\widehat{\sigma}_{n}^{(l)})^{2}(σn(l)​)2

III-E UL virtual channel tracking

IV Downlink Channel Model Reconstruction and Channel Restoration

IV-A DL channel model parameters reconstruction

IV-A1 αk′\alpha_{k}^{\prime}αk′​

IV-A2 Qk′\mathcal{Q}_{k}^{\prime}Qk′​ and ρk′\boldsymbol{\rho}_{k}^{\prime}ρk′​

IV-B DL channel restoration by optimal Bayesian Kalman filtering

IV-B1 **OBKF for DL channel tracking **

IV-B2 Sum-Product Algorithm for posterior noise statistics

Lemma** 1**

IV-B3 MCMC computation

V Simulations Results

VI Conclusion

VII Appendix

III-D1 Searching $\boldsymbol{\hat{\mathbf{c}}}_{k}^{(l)}$

III-D2 Computing $\hat{\boldsymbol{\rho}_{k}}^{(l)}$

III-D3 Computing $\widehat{\alpha}_{k}^{(l)}$ , ${\widehat{\mathbf{\Lambda}}}_{k}^{(l)}$ ,and $(\widehat{\sigma}_{n}^{(l)})^{2}$

IV-A1 $\alpha_{k}^{\prime}$

IV-A2 $\mathcal{Q}_{k}^{\prime}$ and $\boldsymbol{\rho}_{k}^{\prime}$

IV-B1 OBKF for DL channel tracking

Lemma 1