Massive MIMO Channel Estimation with an Untrained Deep Neural Network
Eren Balevi, Akash Doshi, Jeffrey G. Andrews

TL;DR
This paper introduces a novel untrained deep neural network-based channel estimator for massive MIMO systems that approaches MMSE performance without training, complex inversions, or covariance knowledge, and effectively mitigates pilot contamination.
Contribution
It presents an untrained deep neural network approach for massive MIMO channel estimation that achieves near-MMSE performance and robustness to pilot contamination without requiring training.
Findings
Approaches MMSE performance with 64 antennas and subcarriers.
Does not require training or channel covariance knowledge.
Effectively eliminates pilot contamination under certain conditions.
Abstract
This paper proposes a deep learning-based channel estimation method for multi-cell interference-limited massive MIMO systems, in which base stations equipped with a large number of antennas serve multiple single-antenna users. The proposed estimator employs a specially designed deep neural network (DNN) to first denoise the received signal, followed by a conventional least-squares (LS) estimation. We analytically prove that our LS-type deep channel estimator can approach minimum mean square error (MMSE) estimator performance for high-dimensional signals, while avoiding MMSE's requirement for complex channel inversions and knowledge of the channel covariance matrix. This analytical result, while asymptotic, is observed in simulations to be operational for just 64 antennas and 64 subcarriers per OFDM symbol. The proposed method also does not require any training and utilizes several…
| Epochs | Total Weight Count | |
|---|---|---|
| 8 | 2000 | 496 |
| 16 | 1300 | 1760 |
| 32 | 900 | 6592 |
| 64 | 250 | 25472 |
| Epochs | Total Weight Count | |
|---|---|---|
| 8 | 4000 | 1504 |
| 16 | 1970 | 3776 |
| 32 | 1800 | 10624 |
| 64 | 1000 | 33536 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Massive MIMO Channel Estimation with an Untrained Deep Neural Network
Eren Balevi, Akash Doshi, and Jeffrey G. Andrews The authors are with the University of Texas at Austin, TX, USA. Email: [email protected], [email protected], [email protected]. This work has been supported in part by Intel.
Abstract
This paper proposes a deep learning-based channel estimation method for multi-cell interference-limited massive MIMO systems, in which base stations equipped with a large number of antennas serve multiple single-antenna users. The proposed estimator employs a specially designed deep neural network (DNN) to first denoise the received signal, followed by a conventional least-squares (LS) estimation. We analytically prove that our LS-type deep channel estimator can approach minimum mean square error (MMSE) estimator performance for high-dimensional signals, while avoiding MMSE’s requirement for complex channel inversions and knowledge of the channel covariance matrix. This analytical result, while asymptotic, is observed in simulations to be operational for just 64 antennas and 64 subcarriers per OFDM symbol. The proposed method also does not require any training and utilizes several orders of magnitude fewer parameters than conventional DNNs. The proposed deep channel estimator is also robust to pilot contamination and can even completely eliminate it under certain conditions.
Index Terms:
Deep learning, channel estimation, massive MIMO, OFDM.
I Introduction
In multi-antenna systems, obtaining accurate channel state information (CSI) is a central activity both for precoding the spatial streams before transmission and for coherently combining the received signals from each antenna. This is particularly true for massive multi-input multi-output (MIMO) base stations, which are by definition equipped with a very large number of antennas that transmit to many users at the same time and on the same frequency band [1]. Channel estimation is nevertheless quite challenging for multicell massive MIMO cellular networks. This is fundamentally due to pilot contamination – which is the interference of pilot symbols utilized by the users in neighboring cells – and noise, but also because operations such as matrix inversion and singular value decomposition (SVD) are impractically complex for large channel matrices. A low overhead, low complexity, and scalable (in terms of the number of antennas) channel estimator is very desirable for massive MIMO and current solutions have nontrivial drawbacks. This paper leverages recent developments in deep learning to design a novel deep massive MIMO channel estimator that achieves these desirable properties.
I-A Related Work and Motivation
Conventional DNNs are fairly complex and typically require a large number of parameters to be trained with large datasets [2]. Thus, they are not suitable for channel estimation in wireless systems, where channels change quite rapidly. A recent special DNN design called a deep image prior [3] does not require training, and thus avoids the need for a training dataset. It was proposed to solve inverse problems in image processing such as denoising and inpainting, and is analogous to reducing noise and pilot contamination, which are two key impediments in the channel estimation process. We modify and optimize this architecture for massive MIMO channel estimation so as to have a moderate number of parameters. One of the salient features of our deep channel estimator lies in not requiring any statistical knowledge about the channel except what can be directly obtained from the received signal. This not only eliminates the need to know or learn the channel statistics, but also makes the estimator applicable to any kind of channel including Gaussian or non-Gaussian, line-of-sight (LOS) or non-LOS (NLOS), and limited or rich scattering.
The seminal paper on massive MIMO uses a least-squares (LS) estimator [1]. Despite its low complexity, the LS estimator achieves significantly less accurate channel estimation than minimum mean square error (MMSE) estimation [4], which has been used in subsequent massive MIMO studies [5], [6], [7]. Although the impact of the channel estimation quality is profound in massive MIMO [6], employing an MMSE estimator is undesirable for two main reasons: (i) it requires an accurate estimate of the channel correlation matrix between the base station and each user, the estimation of which requires a very large number of samples in proportion to the number of antennas and has to be repeated frequently due to mobility; and (ii) a large matrix inversion is needed for MMSE estimation, and thus the complexity growing as the cube of number of antennas. For both reasons, MMSE estimation scales very poorly in terms of the base station array size.
A key challenge for massive MIMO is pilot contamination, which is a fundamental limiting factor, since small scaling fading and noise vanish as the law of large numbers kicks in [1]. There are many papers that attempt reliable channel estimation for massive MIMO under pilot contamination. The key idea is usually to exploit the differences among the channel covariance matrices of different users. Specifically, [8] partitions users into groups according to the similarity of their covariance matrices, and serves them accordingly. A similar idea was utilized in [9], which developed a covariance-aware pilot assignment algorithm with some coordination among base stations. A special pilot scheduling algorithm was developed for sparse massive MIMO channels in [10]. The sparsity of massive MIMO channels was also used for channel estimation with complex iterative approximate message passing and expected-maximization (EM) algorithms in [11]. Another method based on channel statistics was presented in [12]. In addition to these, there are blind channel estimators relying on channel second-order statistics to reduce the number of pilots [13], [14]. These works require estimating large covariance matrices or assume they are somehow available for free. Furthermore, their applicability is limited to NLOS zero-mean Gaussian channels, some of which further need sparsity, which exists only for low angle delay spread.
Developing an improved LS-type estimator, which like LS does not require knowing the channel correlation/covariance matrices and does not involve matrix inversions, is of significant interest but is an open problem. However, the performance gap is large: we show that the average spectral efficiency decreases by about 50% for normal LS estimation versus MMSE estimation. The introduction of techniques from deep learning points to a potential remedy, since these techniques have been recently used for other challenging communication theory problems without closed-form solutions [15], [16], [17], [18], [19], [20].
I-B Contributions
The main contribution of this paper is to propose a novel low complexity massive MIMO channel estimation technique that is robust to pilot contamination. The novelty is the use of a deep neural network (DNN) for denoising prior to a conventional LS-type operation, which is trivially simple. The proposed denoising is done via a specially designed DNN similarly to the deep image prior proposed recently for image processing applications [3], [21]. We optimize the number of parameters and epochs to ensure low complexity, eventually reducing the number of parameters from the order of millions to hundreds or a few thousand.
We mathematically prove that this proposed deep channel estimator approaches and ultimately achieves the MMSE performance as the product of the number of base station antennas, subcarriers and coherence time interval (in terms of OFDM symbols) becomes large. The simulation results appear to confirm this for moderate dimensionality, namely a signal block, i.e. the number of antennas, subcarriers, and OFDM symbols are all . Pilot contamination is reduced in the proposed estimator by learning some prior from the interference-free region in the OFDM grid and patching these priors into the pilot contaminated areas. Additionally, we do not assume that the base stations are perfectly synchronized, so the base stations spread pilots randomly and allocate them to users orthogonally over the time-frequency grid for one coherence time interval. Our results reveal that under some conditions (e.g., when 5% of the OFDM grid is contaminated by neighboring cells with 4 fold weaker interference power relative to the target signal in the low noise regime) the deep channel estimator can completely remove the interference even if the eigenspace of the desired user and interferer fully overlap. The initial results for the proposed deep channel estimator are presented in [22] basically for single antenna OFDM communication without a theoretical analysis and optimizing the architecture. Additionally, [22] does not consider co-channel interference.
The paper is organized as follows. The system model and problem statement are given in Section II and Section III. The deep channel estimator is explained in detail in Section IV, an analysis of which appears in Section V. The performance results are illustrated with extensive simulations in Section VI. The paper concludes with Section VII.
II System Model
We consider a cellular network that has base stations with large number of antennas and single antenna users. Specifically, base stations comprise antennas and serve users such that . We assume that OFDM symbols with subcarriers are transmitted in a time division duplex (TDD) frame structure. To estimate the reciprocal uplink and downlink channels, users in the same cell send orthogonal pilot sequences with length . For the target base station the received signal in the frequency domain can be expressed as
[TABLE]
where is the transmit power, is the channel between the target base station and its user, is the pilot sequence used for channel estimation such that and denotes the Kronecker product. The notation is the same for the second term in the right-hand side (RHS) of (1), which represents the users in other cells, and
[TABLE]
The last term denotes the Gaussian noise matrix whose independent and identically distributed (i.i.d.) elements are zero-mean Gaussian random variables with variance .
The user signal in the base station is obtained by
[TABLE]
such that . Due to the mixed-product property of the Kronecker product
[TABLE]
it is straightforward to express (3) as
[TABLE]
where
[TABLE]
As can be observed in (2), other users in other base stations can also use the same pilot sequences with the user in the target cell. This is because pilots are limited by the time-frequency resources, and so it is not possible to allocate orthogonal pilots for all users in all cells at least not without greatly degrading the ability to transmit information-bearing symbols. The resulting interference is known as pilot contamination.
III Problem Statement
To have more compact expressions, the matrices are defined as vectors by concatenating the columns, which are given by
[TABLE]
where . The same notation is utilized for , and . Substituting (5) with these yields
[TABLE]
To estimate the channel between the user and the target base station, (8) is multiplied with a linear matrix such that
[TABLE]
where
[TABLE]
and
[TABLE]
in which
[TABLE]
As is clear from (11), LS estimation has very low complexity, whereas MMSE estimation requires not only the autocorrelation matrices of all users that use the same pilot sequence but also a matrix inversion, the complexity of which scales as . Hence, the MMSE estimator is not a viable option for systems with large number of antennas and/or subcarriers [11]. Despite the appeal of the LS estimator in terms of low complexity, it provides much less accurate estimation. To illustrate this, we consider average spectral efficiency
[TABLE]
where is the coherence time interval. The average sum of the spectral efficiency based on (13) for LS and MMSE estimators is depicted for different combiners, namely for maximum ratio (MR), zero-forcing (ZF), and MMSE combiners in Fig. 1. As can be shown, there is a considerable decrease in the average sum spectral efficiency due to LS channel estimation, in particular for MMSE and ZF combiners. A channel estimation technique that exhibits MMSE estimator performance with LS estimator complexity is highly desirable.
We consider deep learning as a remedy, however the high dimensionality of the signals is a challenge. This is because the higher the signal dimension is, the larger the number of necessary parameters in the DNN model, which needs to be trained with a dataset whose size is proportional to the number of parameters. To illustrate, a fully connected neural network for an antenna OFDM system requires input neurons. If there are layers in this DNN, each of which has units for , this leads to parameters, where due to the real and imaginary parts of the signal. This can easily yield millions of parameters, and thus requires a very large training dataset. To illustrate, if and , this yields approximately parameters for layers when for . Although convolutional neural networks can considerably decrease the number of parameters, a large training dataset is still necessary. This is obviously an impediment in using neural networks for real-time channel estimation, where only a very limited number of pilots (i.e. labels)111There can be some unsupervised or semi-supervised learning models that make channel estimation with no labels or with very limited labels. However, there is not any generic known channel estimation model yet for this method, and this subject remains mostly open. can be used.
In this paper, we propose a new DNN based channel estimation method that does not require training. Our main idea is to denoise the received signal via the DNN and then use that denoised signal for LS channel estimation instead of the raw received signal. Since the proposed estimator does not require training, there is no complexity increase due to training. This also prevents the inevitable performance loss for estimators that are trained for some channel realizations but then used in others. The details of the proposed method are elaborated next.
IV Deep Channel Estimator Model
Training overhead is the primary obstacle to making state-of-the-art DNNs practically implementable for high-dimensional channel estimation. In the context of image processing, a recent paper shows that training is not necessary for a special DNN design, which is known as Deep Image Prior (DIP), for solving the inverse problems of denoising and inpainting [3]. The main idea behind this untrained DNN or DIP model is to fit the parameters of a neural network for each image on the fly without training them on large datasets beforehand. This model was later optimized to reduce the number of required parameters [21]. Both [3] and [21] observed very efficient denoising and inpainting performance thanks to the specifically designed DNN architecture, which has low impedance for natural images and high impedance against noise.
For massive MIMO-OFDM channel estimation, denoising and inpainting are analogous to eliminating noise and pilot contamination, respectively, and adapting DIP model to the channel estimation problem is promising. This is because (i) in communication systems, there is a limited number of pilots (or labeled data), and thus the architectures based upon large training dataset are not feasible; (ii) in conventional DNNs, training and testing have to be done for the same channel realization to obtain better performance, which brings in heavy training overhead; and (iii) noise and interference are the main impediments that hinders to make a reliable channel estimation for massive MIMO. Motivated by these factors, the specifically designed DNN architecture for the DIP model is leveraged to make channel estimation. In particular, we modify the input and output layers of the one variant of DIP architecture [21], and use it as a baseline, which we term a deep channel estimator.
The proposed deep channel estimator is composed of two stages. In the first stage, a less noisy signal is generated from the received signal through a specially designed DNN architecture mentioned above, and some prior information is obtained to mitigate interference. In the second stage the generated or filtered signal is multiplied by the Hermitian of the pilot sequence for channel estimation. This apparently means that we propose an LS-type channel estimator with the only difference being that the signal generated by the DNN is used instead of the received signal. By doing that the low complexity nature of LS estimator is combined with the noise reduction capability of the DNN so as to have a near MMSE estimation performance. The price paid for the proposed deep channel estimator is the need for fitting the parameters of the DNN periodically for each OFDM grid, whose period is determined by the channel coherence time (or equivalently maximum Doppler spread). However, the complexity increase is quite reasonable thanks to the low number of parameters, as will be explained.
The received signal in (1) can be equivalently written in 3-dimensional form as
[TABLE]
where is the received signal of the target base station in the antenna for the subcarrier in the OFDM symbol. Notice that (14) is expressed in terms of the length of the coherence time instead of the number of pilots, which contains OFDM symbols. This is because the parameters of the DNN has to be fitted periodically with coherence time. The real and imaginary part of (14) is separated into independent channels in our architecture, since tensors do not support complex operations. This tensor representation of is denoted as . Specifically, , where the dimensions are for the spatial, frequency, time, and complex domains. In our architecture, we stack the spatial and complex domains which leads to , where .
The working principle of the deep channel estimator is to generate from a randomly chosen input tensor , which can be considered as an input filled with uniform noise, through hidden layers, whose weights are also randomly initialized, and then optimized via gradient descent. The overall DNN model that depicts the input, output and hidden layers for a -dimensional communication signal is given in Fig. 2.
The key component in the aforementioned DNN model is the hidden layers, which are composed of four major components. These are: (i) a convolution, (ii) an upsampler, (iii) a rectified linear unit (ReLU) activation function, and (iv) a batch normalization. A convolution means that each element in the time-frequency grid is processed with the same parameters through the spatial domain, which changes the dimension. More precisely, an data vectors in the hidden layer is element-wise multiplied with an kernel and summed. There are different kernels, which are shared for each slot in the time-frequency axes. Hence, the spatial dimension becomes . This can be equivalently considered as each vector in the time-frequency slot being multiplied with the same (shared) matrix. In what follows, upsampling is performed to exploit the couplings among neighboring elements in the time and frequency grid. More precisely, the time-frequency signal is upsampled with a factor of via a bilinear transformation. Next, the ReLU activation function is used to make the model more expressive for nonlinearities. The last component of a hidden layer makes batch normalization for a batch size of to avoid vanishing gradients. This structure of a hidden layer is portrayed in Fig. 3. All the hidden layers have the same structure except for the last hidden layer, which does not have an upsampler.
The mathematical representation of the aforementioned architecture is given next. Accordingly, the tensor is parameterized for the layer as
[TABLE]
where the input has a dimension of in the spatial, frequency and time domain, respectively. These dimensions are determined according to the number of hidden layers and the output dimension, in which , , . The layers from [math] to are counted as a hidden layer, and for
[TABLE]
where is the input of the hidden layer, are the parameters, and represents the so-called “convolution” operator, which actually refers to cross-correlation in signal processing. More precisely, a convolution is utilized as a cross-correlator, which means that the spatial vector for each element of the time-frequency grid is multiplied with the same shared parameter matrix to obtain the new spatial vector for the next hidden layer. The last hidden layer is
[TABLE]
and the output layer is
[TABLE]
All the parameters can be represented as
[TABLE]
which are optimized according to the square of -norm
[TABLE]
The output of the DNN for the optimized parameters is
[TABLE]
where . After generating the denoised signal in (21) from a random input , an LS channel estimator is employed by multiplying (21) with the Hermitian transpose of the pilot sequence.
V Theoretical Analysis
The denoising capability of the proposed LS-type deep channel estimator determines how close it can approach the MMSE estimation performance. Next, we prove that our architecture can filter all the noise for high-dimensional signals, e.g., for massive MIMO-OFDM, and can achieve the MMSE estimator performance.
Theorem 1**.**
The proposed LS-type deep channel estimator achieves the MMSE estimator performance as the product of the number of base station antennas , number of subcarriers and coherence time interval goes to infinity, assuming there is no pilot contamination. That is,
[TABLE]
where and are the channel estimation errors for the proposed deep channel estimator and conventional MMSE estimator, respectively.
Proof.
The proof is composed of three parts. In the first part, we generalize the noise suppression level of the architecture [21] for the deep channel estimator as
[TABLE]
where n is the noise vector in the received signal, and shows the fitted amount of noise at the output of the deep channel estimator such that means that all noise is cancelled, and is a numerical constant. Although (23) is satisfied with probability at least for in [21], this probability goes to 1 for our case due to the high-dimensional massive MIMO-OFDM signal model, i.e.,
[TABLE]
since . Hence, with the optimum parameters , (23) can be expressed in terms of the maximum noise suppression level
[TABLE]
with probability 1.
In the second part of the proof, we make use of deep learning theory regarding overparameterization. We observe that the denominator of the second term in the right-hand side of (25) scales with the dimension of the received signal, since , , , whereas the dimension of the hidden layers does not. In particular,
[TABLE]
due to (16) and (17), in which the denominator of the right-hand side of (26) is scaled by 4 due to the oversampling by 2 in the time and frequency axes. Since ,
[TABLE]
Now, we proceed to see how the spatial dimension of the hidden layers scales with the number of antennas. Accordingly, the objective function in (20) is written in terms of energy minimization [3]
[TABLE]
It is standard to express (28) in terms of a function approximator and a regularizer as
[TABLE]
For (29), increasing the width of the last hidden layer while keeping the dimension of the other hidden layers fixed is sufficient to fit the received signal [23]. Using (29) by defining
[TABLE]
where is a random positive semidefinite matrix with arbitrarily small Frobenius norm, and writing the last layer of the deep channel estimator for a time-frequency slot as
[TABLE]
it is sufficient to increase the spatial dimension as [24]
[TABLE]
where is the rank of the channel that shows the number of independent received samples. That is, it increases sublinearly with the increasing number of antennas. Due to (26) and (32),
[TABLE]
In the third part, we derive the asymptotic channel estimation errors in view of the first two parts. The channel estimation error for MMSE estimator can be expressed in terms of covariance matrix
[TABLE]
where
[TABLE]
In terms of the eigenvalues of the correlation matrix , (34) can be written using (35) as
[TABLE]
Thus,
[TABLE]
since uncorrelated noise vanishes in massive MIMO. In the case of LS estimator, the error is equal to
[TABLE]
and
[TABLE]
due to (33). This completes the proof of (22). ∎
Notice that if the spatial dimensions of all hidden layers are increased equally instead of only increasing the last hidden layer spatial dimension, this would result in less increase than (32). This means that the spatial dimension increases at worst with the square root of the rank of the channel222Our empirical results support this argument. To illustrate, for single antenna, whereas for 64 antennas.. Another important point regarding this theorem is that the deep channel estimator can ultimately achieve zero estimation error without increasing the transmission power, instead just by increasing the number of antennas and subcarriers.
Even if there is pilot contamination in the environment, the proposed estimator can inherently resist (and even completely eliminate) interference up to some point. However, this holds only if the interference exists in some limited region of the OFDM grid of the desired signal. We associate this behavior with the inpainting capability of the DIP architecture [3]. This implies that our LS-type estimator under pilot contamination can give the MMSE estimation performance in single-cell massive MIMO even for the multicell case, if the pilot contamination is sufficiently localized in time and frequency. This success can be attributed to learning prior information from some interference-free regions and then patching this prior information into the interference regions. In this sense, it is similar to dictionary learning [25]. The comparison of various dictionary learning methods with our estimator as well as integrating our model into one of the dictionary learning methods for enhanced interference mitigation are left as future work.
VI Simulations
The proposed deep channel estimator is compared with the traditional LS and MMSE channel estimators given in (11) using the “LTE-Extended Pedestrian A Model (EPA)” and “Kronecker” channel model. The performance metric is the normalized mean square error (NMSE), which is defined as
[TABLE]
where and are the column vectors that specify the actual and the estimated channel taps in the frequency domain over all antennas, respectively. In this section, we first state the experimental details, then provide the simulation results and discuss the complexity of the estimator.
VI-A Experimental Details
The deep channel estimator is implemented in Pytorch [26] with hidden layers, i.e., as described in Section IV. Without any loss of generality, the spatial dimension of the hidden layers is taken as for . Then, the number of parameters (or equivalently the value of ) is optimized using two Nvidia GeForce GTX 2070 GPUs for acceleration333Note that the dimension of the time and frequency axes of the hidden layers are not tunable, since these are determined by the size of received signal matrix and the number of hidden layers.. The performance of our estimator is evaluated for two channel models, namely the LTE-EPA and Kronecker channel model, which is commonly used to model MIMO channels. However, we present most of our results only for the LTE-EPA model, because our empirical results show that there is not any significant performance difference between these two channel models. To generate a channel realization for the LTE-EPA model, we use the MATLAB® LTE Toolbox, and obtain an (antennas subcarriers symbols) channel matrix assuming that coherence time interval is larger than or equal to OFDM symbols. For the Kronecker channel model, we assume an exponential spatial correlation matrix at the base station with correlation coefficient without any loss of generality [27].
As is the case for multi-cell massive MIMO, users in the same cell are assigned to the orthogonal pilot signals that can be non-orthogonal to the users in the neighboring cell. Our estimator does not put any constraint to the pilot arrangement, since pilots are not used while fitting the parameters of the DNN. Specifically, pilots are only used to perform LS estimation after the received signal is filtered via the DNN. To be more practical in the sense of not requiring any tight synchronization among base stations, a random pilot allocation is utilized for multi-cell massive MIMO such that each base station randomly and orthogonally spreads the pilot tones for its users throughout the OFDM grid per coherence time interval444Note that a block type pilot arrangement in which the the pilots are sent at the beginning of each coherence time interval gives the same performance with the same amount of pilot tones spread randomly throughout the OFDM grid in one coherence time interval.. This can be easily done if the coherence bandwidth is equal to one subcarrier; otherwise more careful allocations have to be done so as to have a single pilot per coherence bandwidth. Random pilot arrangement is similar to the tone hopping (see [28]), which is proposed to attain ergodic capacity bounds and diminish the impact of “deep fades”.
To simulate pilot contamination, we have assumed that a selected number of resource elements in the OFDM grid has a single dominant interferer outside the cell of the desired user signal. This dominant signal is chosen to have a signal of random QPSK symbols multiplied with its complex channel matrix which is simply another realization generated by the LTE-EPA model. This is indeed the worst-case scenario, because the covariance matrices of the user and interferer are fully overlapped. In the simulations, we also consider another pilot contamination scenario such that there is a contiguous interference both in the time and frequency domain over some number of resource elements in the OFDM grid.
VI-B Results
The performance of the proposed channel estimator is first observed for traditional single antenna communication such that single antenna users transmit/receive OFDM symbols to/from a single antenna base station, i.e., . This enables us to quantify how the number of parameters scales with the number of antennas. We then proceed to the case of single cell massive MIMO. In other words, we consider the hypothetical scenario where all users in the cell are assumed to have orthogonal pilots and there is no pilot contamination at the base station. We finally demonstrate the robustness of our estimator in a multi-cell massive MIMO system, where pilot contamination occurs at the base station due to non-orthogonal pilot sequences employed by users in the neighboring cells.
VI-B1 Single antenna OFDM Communication
We first highlight our results in the case of one antenna on the base station in the presence and absence of co-channel interference. For this case, the received signal or the output of the deep channel estimator is chosen to be a matrix, where 2 represents the real and imaginary part, the first 64 represents the number of subcarriers, and the next 64 is the number of OFDM symbols. More precisely, our architecture performs the operation outlined in (16) for times, then the last hidden layer calculates the expression in (17), and finally the output layer brings the output signal to the required channel matrix size, i.e., .
To find the optimum number of parameters in the absence of co-channel interference, we simply add AWGN to the desired signal, and adjust its variance so as to have an SNR between 0 and 20 dB range. In order to optimize the number of channels per layer or the value of , we take a single channel realization disturbed by the least noise (i.e the highest SNR in our range) and observe the convergence of its NMSE with the number of epochs by performing Adam optimization [29] with a learning rate of . We find the number of epochs at which the lowest NMSE is achieved for a given , and proceed to denoise the received signal for the aforementioned range of SNRs for the calculated number of epochs. This approach is often referred to as early stopping. The number of epochs is tabulated in Table I as a function of and the total number of weights in the architecture.
As depicted in Fig. 4(a), the NMSE is lowest for , gets progressively higher for and , and once again decreases for , almost equal to at an SNR of 0 dB. However, there is very little to tell apart the different architectures at SNR of 20 dB. These performance statistics can be somewhat explained by the following insight: larger noise levels require smaller values of . If the noise is significantly larger, then we can either choose smaller or use early stopping. In this plot, the MMSE curve is obtained using the channel correlation matrix that is computed via Monte Carlo simulation as outlined in [4], whereas the “Genie Aided MMSE” assumes that the channel correlation matrix is available for free, which is highly impractical. Promisingly, our channel estimator for clearly outperforms LS and MMSE estimators and approach the ‘Genie Aided MMSE” performance without having any (statistical) information other than the received signal. To have a better understanding of why the deep channel estimator works so well, we observe its performance for an unrealistic case, in which each subcarrier in the OFDM grid has an i.i.d Rayleigh fading channel. In this case, our estimator does not perform well, which proves that its success is attributed to exploiting correlations.
To find the optimum number of parameters in the presence of co-channel interference, 10% of the resource elements of the OFDM grid expressed in (14) are corrupted by injecting an interference that is dB weaker than the desired signal, i.e., SIR = 6 dB. As shown in Fig. 4(b), clearly outperforms for SNRs less than 10 dB, after which their performance is similar. Hence, it is reasonable to take in the architecture for the case where . We observed that with the addition of co-channel interference, stopping earlier than was ascertained in the interference-free case could be beneficial, however we did not change the number of epochs for which the training was performed. This is because in a practical scenario, where we do not have access to the noiseless received signal, we cannot ascertain when to stop, it has to be determined beforehand. Even without dynamically adapting early stopping, the deep channel estimator with beats MMSE estimator up to 10 dB, which also means that it has better interference mitigation capability.
VI-B2 Single-cell Massive MIMO
The deep channel estimator is mainly intended for multiple antennas in this paper. Thus, is set to and a matrix is obtained by concatenating the real and imaginary part of the signal with the antennas in the spatial axis. Here, the spatial domain is used to stack up the real and imaginary domain, because this axis is more appropriate for uncorrelated samples in the architecture. In this case, we first observe the impact of an increased number of pilots by varying under the assumption of block type pilot arrangement. Accordingly, the pilots are transmitted periodically over all the subcarriers assuming that the coherence bandwidth is equal to one subcarrier without any loss of generality. The results are shown in Fig. 5. Clearly an increase from to benefits the NMSE, but no benefit is obtained beyond that. This is an artifact of the LTE-EPA model which has very high temporal correlation, and consequently needs very few pilots in the time domain to represent the channel accurately. For the rest of the experiments555A random pilot arrangement is used instead of a block type pilot arrangement in the case of multi-cell massive MIMO, which refers to that pilot tones are allocated to the subcarriers that belong to different OFDM symbols., we adopt . Notice that although there is a single OFDM pilot symbol, orthogonal pilot sequences can still be formed using the frequency domain thanks to the OFDM.
Following the same procedure that was adopted for single antenna communication, the optimum number of parameters is determined first, which is tabulated in Table II in terms of , the number of epochs and the total number of weights. In what follows, the NMSE as a function of SNR is plotted for different values of . As depicted in Fig 6, the results perfectly reconcile with the single antenna case. That is, at larger noise levels (or lower SNR), smaller values of perform much better. However at higher SNR, due to early stopping, all the architectures tend to the same NMSE, with the higher ones performing slightly better. Accordingly, appears to have the best performance.
We repeat this experiment for the Kronecker channel model by taking the exponential spatial correlation matrix at the base station with correlation coefficient . However, as shown in Figure. 7, the performance of the deep channel estimator is almost unaffected by the change in spatial correlation of the channel matrix. This is because oversampling is not utilized in this domain. Hence, for brevity the results are only presented for the LTE-EPA model in the rest of the paper.
VI-B3 Multi-cell Massive MIMO
To assess the robustness of the deep channel estimator against pilot contamination, we first search for the optimum value of , and then exhibit the results. Since base stations allocate random pilots that are spread over the OFDM grid in one coherence time interval, the optimum value of is searched after contaminating 5% of the time-frequency grid randomly (but across all antennas) with interference at an SIR of 6 dB. In particular, we checked whether is the optimal architecture as in the case of single cell massive MIMO. We found to outperform all the other architectures, hence the architecture is optimized with for the rest of the multi cell massive MIMO experiments. This experiment is extended by also contaminating 10% of the OFDM grid for . The results for both 5% and 10% contamination are presented in Fig. 8(a). In this case, the deep channel estimator outperforms MMSE estimator up to an SNR of 7 dB even in the presence of up to 10% pilot contamination. The flattening out of the NMSE curve with increased interference is due to not patching the signal in the areas corrupted by interference beyond a certain limit. In the image processing, this corresponds to an upper bound on the size of patches that can be recovered by region inpainting.
To further quantify the pilot contamination performance of our estimator, we verify its robustness for a different power allocation method. Accordingly, pilots are not only randomly but also contiguously distributed over the resource elements. To be more precise, 2 blocks of squares (corresponding to of the overall time-frequency grid) are chosen randomly, in which interference at SIRs of 10, 15 and 20dB is injected. Although the deep channel estimator in this case can tolerate lower powers of interference than the previous case, its performance, as illustrated in Fig. 8(b), is still better than LS estimator for all SNRs and MMSE estimator up to an SNR of 6 dB for the SIRs that are greater than 10 dB.
VI-C Complexity
Regarding the complexity of the deep channel estimator, it is important to note the trade-off between the number of parameters and the number of epochs required. As seen in Table I, a decoder of decreased complexity requires a larger number of epochs to attain the minimum NMSE. For instance, in the case of a single antenna the number of parameters for are 496, in which the NMSE are the least, but it requires 2000 epochs to attain this NMSE. On the other hand, the architecture has 25,472 parameters, and attains a slightly higher NMSE than , but requires a mere 250 epochs to attain this NMSE. This result has in fact been proven for the case of supervised learning of a single hidden layer neural network in [30], where they show that as the degree of overparameterization of the NN increases, it takes fewer epochs to converge to one of the many global minima in its objective function’s landscape. As a result, if the deep channel estimator was to be deployed in a latency critical application and subject to online training, where a slightly higher NMSE could be tolerated, one should use the model with a higher value.
For the case of antennas, the optimal architecture surfaces for , which has only 3776 parameters but requires 1970 epochs to attain its lowest NMSE. On the other hand, has 116,224 parameters, but requires only 1000 epochs to attain its lowest achievable NMSE, which is much higher than . For training around the same number of epochs such as 2000, the single antenna architecture has 496 parameters, while the massive MIMO architecture has 3776 parameters. This comparison is quite important, and specifies the sub-linear increase in computational complexity with the number of antennas.
VII Conclusions
In this paper we proposed a novel deep channel estimator comprised of a DNN followed by a simple LS-type estimator. This deep channel estimator exhibits superior performance compared to LS and MMSE estimators that have no inherent way of dealing with pilot contamination (or co-channel interference). Promisingly, our low-complexity estimator performs better than more complex MMSE estimator, in which the channel correlation matrices are estimated from the samples, and even approaches the “Genie-Aided MMSE” where the channel statistics are perfectly known for free. The deep channel estimator appears to exploit correlations in the time-frequency grid very efficiently. The strong performance is also explained by a supporting mathematical analysis. The salient features of the proposed estimator are as follows. The number of parameters scale at a rate less than the square root of number of antennas, which yields hundreds or thousands of weights as opposed to millions of parameters in conventional DNNs. Furthermore, the proposed estimator is appropriate for any environment or channel type, since it only needs the received signal and some pilots.
It would be interesting as future work to study the deep channel estimator for high mobility channels. Similarly, observing the performance of the deep channel estimator for mmWave channels seems intriguing. Furthermore, enhancing its interference mitigation capability can also be a good future research direction. In particular, some other dictionary learning algorithms can be adapted to our model. Additionally, it would be interesting to observe how our estimator performs when the eigenspace of the covariance matrices of interfering users does not fully overlap with the target user.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun. , vol. 9, no. 11, pp. 3590 - 3600, November 2010.
- 2[2] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press , 2016, http://www.deeplearningbook.org.
- 3[3] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc IEEE Conference on Computer Vision and Pattern Recognition , pp. 9446-9454, June 2018.
- 4[4] J.-J. van de Beek, O. Edfors, M. Sandell, S. Wilson, and P. Borjesson, “On channel estimation in OFDM systems,” in Proc. IEEE VTC , vol. 2, pp. 815-819, March 1995.
- 5[5] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “The multicell multiuser MIMO uplink with very large antenna arrays and a finite-dimensional channel,” IEEE Trans. Commun. , vol. 6, no. 61, pp. 2350 - 2361, June 2013.
- 6[6] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. Commun. , vol. 4, no. 61, pp. 1436-1449, April 2013.
- 7[7] A. Adhikary, A. Ashikhmin, and T. L. Marzetta, “Uplink interference reduction in large scale antenna systems,” IEEE Trans. Commun. , vol. 5, no. 65, pp. 2194-2206, May 2017.
- 8[8] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, “Joint spatial division and multiplexing-the large-scale array regime,” IEEE Trans. on Info. Theory , vol. 59, no. 10, pp. 6441-6463, October 2013.
