Machine Learning Techniques for Blind Beam Alignment in mmWave Massive MIMO

Aymen Ktari; Hadi Ghauch; Ghaya Rekaya-Ben Othman

PMC · DOI:10.3390/e26080626·July 25, 2024

Machine Learning Techniques for Blind Beam Alignment in mmWave Massive MIMO

Aymen Ktari, Hadi Ghauch, Ghaya Rekaya-Ben Othman

PDF

Open Access

TL;DR

This paper introduces machine learning techniques to efficiently align beams in mmWave MIMO systems with minimal pilot overhead.

Contribution

A novel ML-based blind beam alignment method is proposed, reducing pilot overhead using low-complexity models.

Findings

01

ML models accurately predict non-sounded beams using only 10% of the total beams.

02

The method works across various codebook sizes from 128×128 to 1024×1024.

03

Received Signal Energies are used to train models without requiring channel state information.

Abstract

This paper proposes methods for Machine Learning (ML)-based Beam Alignment (BA), using low-complexity ML models, and achieves a small pilot overhead. We assume a single-user massive mmWave MIMO, Uplink, using a fully analog architecture. Assuming large-dimension codebooks of possible beam patterns at UE and BS, this data-driven and model-based approach aims to partially and blindly sound a small subset of beams from these codebooks. The proposed BA is blind (no CSI), based on Received Signal Energies (RSEs), and circumvents the need for exhaustively sounding all possible beams. A sub-sampled subset of beams is then used to train several ML models such as low-rank Matrix Factorization (MF), non-negative MF (NMF), and shallow Multi-Layer Perceptron (MLP). We provide an extensive mathematical description of these models and the algorithms for each of them. Our extensive numerical results…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Cell lines1

S2— Drosophila melanogaster (Fruit fly) · Spontaneously immortalized cell line

Diseases3

MF NMF injury to people or property

Figures10

Click any figure to enlarge with its caption.

Funding1

—Télécom Paris, l’Institut Polytechnique de Paris, France

Keywords

mmWave MIMOmassive antennasML-based Beam Alignmentblind BAMatrix FactorizationMulti-Layer Perceptronnon-linear regression

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMillimeter-Wave Propagation and Modeling · Microwave Engineering and Waveguides · Radio Frequency Integrated Circuit Design

Full text

1. Introduction

Driven by the explosive growth trend of large-scale connectivity and higher data rate systems, wireless data traffic is expected to exponentially increase, growing to 5 zettabytes per month and reaching a 100 Gps data rate by 2030 [1] Thus, the latency in the 6th Generation is predicted to reach 0.1 ms, representing $[eqn]$ of $[eqn]$ latency, in order to support new emerging technical needs, including holographic images, Internet of Things applications, and autonomous driving.

Beam Alignment is frequently defined in the literature as beam sounding, i.e., beam training. It illustrates a fundamental problem in millimeter-wave Multiple Input, Multiple Output systems, defined as the exchange of information between the user equipment $[eqn]$ and the base station $[eqn]$ in order to accurately select the optimal beam-steering direction. The process of aligning the beams is related to several technical problems, such as beam forming, beam sweeping, beam tracking, and beam selection. The whole framework that unites these operations between $[eqn]$ and $[eqn]$ is often denoted as the Beam Management. To fulfill the BA task, beam patterns stored in large codebooks are used at both $[eqn]$ and $[eqn]$ . In fact, pencil beams with directional gain are increasingly being used in several applications in order to alleviate the severe path-loss attenuation and increase capacity and data throughput. On the other hand, massive MIMO systems provide large gain in spectral and energy efficiencies compared with conventional MIMO systems. Using mmWave technology, these systems mainly offer a better communication quality by increasing the system bandwidth and reducing the effects of noise and interference. Due to the diversification of future $[eqn]$ and towards $[eqn]$ applications and intelligent systems, scientists predict the continuous generation of massive datasets for deep processing through large bandwidths, which introduces mmWave bands as the golden spectrum band candidates. However, the limitations of mmWave communication physical properties of the channel are crucial: scattering, attenuation, low coherence time related to the Doppler effect, penetration loss, environmental constraints, and complex channel modeling in realistic urban scenarios. The major problem we aim to encounter in this paper is the inevitable high signaling/training overhead. For this reason, the main trade-off is to browse the most accurate and the least complex $[eqn]$ algorithm that optimizes finding the optimal beam pair based on sounded instantaneous Received Signal Energies and using the minimum (possible) amount of training samples.

Contributions: In this current work, we propose ML-based BA methods, for a single user massive mmWave MIMO, Uplink, with a wide-band channel. We assume a single radio frequency chain with large codebooks of possible analog beams at BS (also known as BS codebook) and UE (also known as UE codebook). We define a beam pair as one beam from the BS and UE codebook. By approximating the SNR with the Receive Signal Energy (RSE), we bypass the need for CSI, i.e., a blind approach. We sub-sample large codebooks into smaller sub-sampled BS and UE codebooks, and sound the beam pairs from the sub-sampled codebooks to generate the training set—a novelty of the approach. Using the RSE of the sounded beam pairs (sub-sampled codebooks), we propose to train the following ML methods to predict the RSE of the beam pairs that were not sounded: Matrix Factorization (MF), non-negative Matrix Factorization (NMF), and feed-forward (shallow) Multi-Layer Perceptron (MLP).

We formulate the MF and NMF problems. We propose to use Block Coordinate Descent (BCD) and Block Gradient Descent (BGD) methods to solve each problem. We derive in depth all the update equations for these methods. We show that the BCD method converges to a stationary point from both MF and NMF problems. Our extensive numerical results show that, sub-sampling $[eqn]$ of the BS/UE codebooks, the remaining RSE values can be predicted extremely well (with a training/test error $[eqn]$ ) for every antenna configuration.
We develop at length the equations of a general MLP model, the resulting loss function, and the corresponding optimization problem. In addition, we derive the equations of back-propagation for the MLP in question. Using extensive numerical results, we observe that sounding $[eqn]$ of original codebooks is sufficient to predict the RSE of the beam pairs that were not sounded, with negligible training/test error.
We numerically compare the training/test losses of all the proposed models for a varying cardinality of codebooks and transmit powers. These results suggest that the BCD method for MF/NMF outperforms the MLP in terms of training and test error. Meanwhile, BCD for MF/NMF has a large computational complexity and the MLP exhibits medium complexity.
Interestingly, by sounding $[eqn]$ of the BS/UE codebooks, the proposed ML models can predict the unknown RSE (beam pairs not sounded) with a negligible test error. Thus, the proposed methods achieve a $[eqn]$ reduction in pilot signaling overhead, compared with the SotA benchmark, without any noticeable loss in performance.

Notations: Matrices and vectors are respectively written in boldface upper-case and lower-case letters. We use $[eqn]$ for the trace, transpose, inverse, conjugate transpose, determinant, and Frobenius norm of a matrix $[eqn]$ and the $[eqn]$ identity matrix. $[eqn]$ is used to denote the (i, j)th entry of a matrix $[eqn]$ . We denote the Hadamard product by ∘, while $[eqn]$ illustrates a Euclidean projection of $[eqn]$ on $[eqn]$ and is applied element by element on $[eqn]$ . We denote $[eqn]$ the absolute value of x and $[eqn]$ as the entry t of a vector $[eqn]$ .

Methods/Experiment: The proposed approach is data driven and model based. The dataset is generated following the Saleh Valenzuela wide-band mmWave system model. It is based on Received Signal Energies for each and every beam pair in the massive MIMO Uplink setup stored in separate .csv files. The model-based solution to the empirical risk minimization includes deriving a closed-form solution to the formulated non-convex optimization problem, stating the theoretical guarantees of convergence and empirically illustrating the success of the proposed partial and blind Beam Alignment procedure using different algorithms. All simulations are executed on Infres GPU servers and the Comelec laboratory PC at Télécom Paris, having the following characteristics: Intel(R) Core(TM) i5-8365U CPU @ 1.60 GHz, 16 Go (RAM), x64 processor, and 64-bit operating system under the license of Windows 10 Enterprise LTSC 2018, version 1809. The manufacturer is Dell and is located in Paris, France. All python packages used in this work (numpy, scipy, keras, pytorch, matplolib..) are related to python 3.9 release. In fact, the experimental protocol is based on offline grid-search cross-validation, which requires GPU processing for the selection of optimal hyperparameters and online training/prediction for Matrix Factorization, non-negative Matrix Factorization, and Multi-Layer Perceptron. The comparison is conducted following a Quality of Service-based approach, simulating a variety of MIMO configurations and architectural setups, investigating the impact of varying the Received Signal Energy regime and empirically stating intersections and differences in the impact of the transmit power on model behaviors, loss values, optimal signaling overhead ratio, and optimal hyperparameters.

Problem Statement: The main challenge addressed in this study is the high signaling overhead in Beam Alignment for mmWave MIMO systems, which hampers the efficient selection of optimal beam-steering directions.
Research Questions and Hypotheses: This study investigates whether machine learning methods can effectively reduce the signaling overhead required for accurate beam-pair prediction in mmWave MIMO systems.
Objectives and Aims: The primary objective is to develop and evaluate ML-based BA methods that minimize the training overhead while maintaining high accuracy in predicting the RSE for unsounded beam pairs.
Significance and Rationale: The study proposes a novel approach to BA using ML techniques, which can lead to a substantial reduction in pilot signaling overhead and enhance the efficiency of future wireless communication systems.

2. Literature Survey

In conventional standards, Exhaustive BA, also called Brute Force BA, is the de facto approach for the alignment process. It is based on sounding all available beams at both $[eqn]$ and $[eqn]$ codebooks in order to exhaustively select the optimal beam pair. One obvious drawback is the fact that the resulting signaling overhead scales as the product of the $[eqn]$ and $[eqn]$ codebook sizes. At 60 GHz, the Exhaustive BA has been adopted in several mmWave $[eqn]$ or $[eqn]$ communication technologies, e.g., IEEE 802.15.3c [2] and IEEE 802.11ad [3]. It is conventionally applied in small MIMO configurations using small codebook sizes (e.g., codebooks of size $[eqn]$ for $[eqn]$ ) and guarantees optimal performance. For cellular networks [4], V2X communications, Unmanned Aerial Vehicles, or High-Speed Train applications, the infeasibility of brute-force-based BA pushes scientists to reduce the large signaling overhead from using massive antennas systems. State-of-the-art methods can be divided into two categories: classic BA and learning-based BA. Traditional techniques tend to use a more and more structured Beam Alignment design such as hierarchical multi-level codebooks [5] (training beamforming vectors are constructed with different beam widths at different levels) and an overlapped beam pattern [6], where the main idea is to augment the amount of information carried by each channel measurement, reducing the required channel estimation time and beam coding [7], where we assign a unique code signature to each beam angle in addition to subspace estimation/decomposition-based $[eqn]$ [8]. Compressed sensing-based algorithms [9] are also used in this context, taking advantage of channel sparsity. Therefore, we state two intersections in classic methods: they generally rely on $[eqn]$ exchange and Exhaustive BA. In contrast, lately, Machine Learning ( $[eqn]$ )-based BA has emerged and is continuously leading to some promising results. For instance, statistical models such as Kolmogorov model-based BA in [10] with sub-sampled codebooks reduce the signaling overhead: $[eqn]$ of Exhaustive $[eqn]$ provides accurate predictions for optimal beams at $[eqn]$ and $[eqn]$ in a partial $[eqn]$ procedure, similar to our approach. Deep learning through shallow neural networks is increasingly used by Wireless Communication scientists, where we distinguish two major paradigms: first, the ML methods related to Supervised Learning ( $[eqn]$ ) via a Support Vector Machine and Multi-Layer Perceptrons for joint analog beam selection in [11], convolutional neural networks for beam management in sub-6 GHz in [12] and for calibrated beam training in [13], recurrent neural networks such as Long Short-Term Memory network for beam tracking in [14,15,16], auto-encoders for beam management in [17], and several other neural architectures, and second, Reinforcement Learning ( $[eqn]$ ) in [18,19,20], generally used to resolve the problems of Multi-Armed Bandit and Markov decision process. In addition, neural architectures have the ability to extract features from the hidden interactions between $[eqn]$ and $[eqn]$ , providing fast and accurate estimations through different MIMO setups and channel realizations, especially when applied to massive datasets where more and more data/train samples are embedded. This work is an extension of [21]. In this paper, we extend the channel model to wide-band and we add multiple RF-chains at $[eqn]$ in a fully analog low-complexity architecture, where we investigate more $[eqn]$ tools for partial and blind $[eqn]$ . This paper is one of the first attempts to apply $[eqn]$ models and shallow Multi-Layer Perceptrons to a blind and partial Beam Alignment for massive mmWave SU-MIMO. Our work in [22] is related to the same approach and objectives, where we quantize the output of each RF-chain.

3. System Model

In this section, we illustrate the mmWave MIMO point-to-point system model. We consider an Uplink transmission from multiple-antenna user equipment $[eqn]$ using a single radio frequency chain and a multiple-antenna base station $[eqn]$ using multiple radio frequency chains. The proposed $[eqn]$ methods are performed at the $[eqn]$ , which has higher computational resources than $[eqn]$ . Figure 1a,b provide a diagram representation of the proposed architecture. $[eqn]$ and $[eqn]$ are respectively equipped with Uniform Linear Arrays of $[eqn]$ and $[eqn]$ antenna. We propose a low-cost/complexity fully analog architecture where $[eqn]$ has one radio frequency chain and $[eqn]$ has $[eqn]$ radio frequency chains. $[eqn]$ selects its analog beamformer $[eqn]$ from a codebook of feasible beam choices, $[eqn]$ , where $[eqn]$ is the corresponding index set. Moreover, $[eqn]$ selects its analog combiner $[eqn]$ from a codebook $[eqn]$ with $[eqn]$ as the index set of the codebook. We denote with $[eqn]$ the number of possible beamforming vectors at $[eqn]$ , i.e., the size/cardinality of the $[eqn]$ codebook, $[eqn]$ and $[eqn]$ , and the size/cardinality of the $[eqn]$ codebook, $[eqn]$ . Both beamforming and combining are fully performed in the analog domain using phase shifters at $[eqn]$ and $[eqn]$ ; thus, they satisfy the following constant modulus constraints, $[eqn]$ :

[eqn]

[eqn]

For our proposed approach, $[eqn]$ is responsible for receiving signal energies, denoted as $[eqn]$ , in order to learn their patterns and features for the purpose of accurately predicting the optimal beam indexes from their corresponding codebooks and send them to $[eqn]$ . We adopt the wide-band channel model $[eqn]$ given by

[eqn]

where $[eqn]$ represents the number of sub-carriers over the whole bandwidth through an $[eqn]$ scenario, k is the index of the sub-carrier k, and $[eqn]$ is the narrow band channel model representing the time domain channel impulse response with L-tapped delays given by $[eqn]$ , where L is number of paths (rank) of the channel; $[eqn]$ and $[eqn]$ are the angles of arrival at $[eqn]$ and the angles of departure from $[eqn]$ , noting AoA/AoD to correspond to the $[eqn]$ path (and both assumed to be uniform over $[eqn]$ ); $[eqn]$ is the complex gain of the $[eqn]$ path such that $[eqn]$ ; and last but not least, $[eqn]$ and $[eqn]$ are the array response vectors at both $[eqn]$ and $[eqn]$ , respectively. We further assume that the channel is completely unknown to both $[eqn]$ and $[eqn]$ . Henceforth, in this paper, we shall denote the beam pair $[eqn]$ as the combination of the $[eqn]$ beamformer indexed u from the $[eqn]$ codebook $[eqn]$ and combiner indexed i in the $[eqn]$ codebook $[eqn]$ . The signal at $[eqn]$ resulting from applying the beam pair $[eqn]$ , $[eqn]$ is expressed as

[eqn]

where $[eqn]$ is the transmitted pilot symbol associated with $[eqn]$ (having power $[eqn]$ ) and $[eqn]$ is the effective additive white Gaussian noise $[eqn]$ with unit variance ( $[eqn]$ ). We define the received Signal-to-Noise Ratio ( $[eqn]$ ) for the beam pair $[eqn]$ as $[eqn]$ . We assume a fully blind approach; i.e., neither $[eqn]$ nor $[eqn]$ has any knowledge of $[eqn]$ . Thus, computing the above $[eqn]$ expression is not feasible due to the fact that BS is assumed not to know $[eqn]$ . Thus, in this work, we will approximate the $[eqn]$ of the beam pair $[eqn]$ using the corresponding instantaneous Received Signal Energies ( $[eqn]$ ) expressed as $[eqn]$ . In other words, we will assume that $[eqn]$ for each beam pair $[eqn]$ .

Benchmark: Exhaustive $[eqn]$ : The de facto method for Beam Alignment is Exhaustive $[eqn]$ . It is accomplished by exhaustively sounding, jointly, the beams of both $[eqn]$ and $[eqn]$ codebooks, recording all entries of $[eqn]$ , and exhaustively searching $[eqn]$ for the indexes of the beam pair that maximize $[eqn]$ at $[eqn]$ , i.e, $[eqn]$ . Thus, the $[eqn]$ matrix is computed/recorded $[eqn]$ -entries, with each of pilot symbol, since $[eqn]$ samples are simultaneously received at the $[eqn]$ for every pilot transmission (see Figure 2). Consequently, the pilot signaling overhead of the Exhaustive $[eqn]$ is $[eqn]$ , which implies that the overhead of this benchmark scales poorly with the $[eqn]$ and $[eqn]$ codebooks.

Proposed partial Beam Alignment using sub-sampled codebooks: Recall the designation of the beam pair $[eqn]$ as the beamforming vector of the index u in the $[eqn]$ codebook of beams and the combining vector of the index i in the $[eqn]$ codebook of beams. First, we select (at random) the indexes of the sub-sampled codebooks of beams at $[eqn]$ and $[eqn]$ , $[eqn]$ and $[eqn]$ , such that $[eqn]$ and $[eqn]$ , and $[eqn]$ $[eqn]$ . The idea behind this approach is to only sound beam pairs from the sub-sampled codebook of beams, $[eqn]$ and $[eqn]$ . We thus define the training set, $[eqn]$ , as the sub-sampled codebook indexes at $[eqn]$ and $[eqn]$ , i.e., $[eqn]$ . Then, the $[eqn]$ of the sounded beam pairs (training set) is given to several ML methods, and the learned ML model is used to predict the $[eqn]$ of non-sounded beam pairs.

We formalize this proposed method below. We express both the received signal $[eqn]$ and $[eqn]$ for the beam pair $[eqn]$ resulting from the sounded beam pairs (i.e., training set), as follows:

[eqn]

[eqn]

The dataset is formulated using the following incomplete $[eqn]$ matrix, $[eqn]$ :

[eqn]

where $[eqn]$ denotes the element $[eqn]$ of $[eqn]$ , $[eqn]$ . Evidently, the value of $[eqn]$ is undefined for the beam pairs that were not sounded, designated as unknown-RSE matrix coefficient. Those are the missing entries, which are predicted using one of the following proposed $[eqn]$ methods: (i) low-rank $[eqn]$ and (ii) shallow (feed-forward) $[eqn]$ , where we utilize the sounded $[eqn]$ entries as the training set, $[eqn]$ . Then the training set, $[eqn]$ , is fed into one of the above ML models, which will predict the $[eqn]$ of non-sounded coefficients in $[eqn]$ , denoted as ‘Unknown’, in (5) (see Figure 3). Finally, the pilot signaling overhead for the above-proposed sub-sampled codebook method is $[eqn]$ . We split the RSE dataset into a training set $[eqn]$ and a test set $[eqn]$ such that $[eqn]$ . In this paper, $[eqn]$ denotes the true value (label) of the RSE for the beam pair $[eqn]$ in the training set $[eqn]$ , and $[eqn]$ denotes the true value (label) of the RSE for the beam pair $[eqn]$ in the test set $[eqn]$ .

Signaling overhead ratio: It is defined as $[eqn]$ , where $[eqn]$ and $[eqn]$ are, respectively, the sizes of the $[eqn]$ and $[eqn]$ sub-sampled codebooks used in our proposed partial beam sounding, while $[eqn]$ and $[eqn]$ refer to the original size of the codebooks, and $[eqn]$ measures the signaling overhead of all the proposed $[eqn]$ , $[eqn]$ , and $[eqn]$ methods compared with that of Exhaustive $[eqn]$ . Evidently, a small value for $[eqn]$ is desired to reduce the signaling overhead of our proposed method. However, a low $[eqn]$ implies that the size of the training set is small. As a result, the proposed $[eqn]$ method will not be able to extract enough data patterns due to the (too) small number of training samples, resulting in a larger prediction error. As one of the contributions of this work, we will (empirically) find as small a value for $[eqn]$ as possible while still having extremely small training and prediction error.

Conjecture: Note that, from the equations of the narrow-band channel model $[eqn]$ and the wide-band channel model $[eqn]$ , it is simple to verify that $[eqn]$ and $[eqn]$ . Assume that $[eqn]$ . Thus, we can approximate the RSE matrix as

[eqn]

If $[eqn]$ , then it can be shown that the RSE matrix $[eqn]$ is such that $[eqn]$ . This implies that if $[eqn]$ , then $[eqn]$ is a low-rank matrix, i.e., $[eqn]$ .

While the proof for this necessary condition eludes the authors, we empirically observed that if $[eqn]$ is large, then the number of non-zero singular values of $[eqn]$ , $[eqn]$ , satisfies the above upper bound, i.e., $[eqn]$ .

Remark 1. Recall the expression for the effective rate, r, $[eqn]$ , where Ω is the pilot signaling overhead and $[eqn]$ is the number of symbols per block. Thus, the problem of maximizing r is written as the following series of equivalent problems:* $[eqn]$ $[eqn]$ , where the last* ⇔ is due to the fact that the $[eqn]$ is a strictly monotonically increasing function in x. This result implies finding the optimal beam pair $[eqn]$ that maximizes r is equivalent to finding the best beam pair that maximizes the $[eqn]$ .

Remark 2. The information (number of entries) needed to represent the $[eqn]$ matrix $[eqn]$ is measured as $[eqn]$ . This result is evident from performing the $[eqn]$ on $[eqn]$ and counting the resulting number of entries. Thus, if $[eqn]$ is severely rank deficient, i.e., extremely compressible, then methods such as $[eqn]$ will exhibit extremely small training and test error. Conversely, if $[eqn]$ is full rank, i.e., not compressible, then the training and test of $[eqn]$ will be quite large.

4. Matrix Factorization and Non-Negative Matrix Factorization

4.1. MF and NMF Problem Formulation

The intuition behind low-rank $[eqn]$ is to model the $[eqn]$ of the sounded beam pairs (i.e., entries of $[eqn]$ that are known as $[eqn]$ ) as an inner product between two D-dimensional latent vectors/factors, $[eqn]$ , as illustrated in Figure 4. Specifically, the $[eqn]$ of the beam pair $[eqn]$ , denoted as $[eqn]$ , is modeled as $[eqn]$ , $[eqn]$ , where D is the size/dimension/complexity of the Matrix Factorization model latent factors and $[eqn]$ are the $[eqn]$ model parameters (to be optimized). In addition, due to the low-rank $[eqn]$ model, D is assumed to be much smaller than the dimensions of $[eqn]$ , i.e., $[eqn]$ . The $[eqn]$ of the beam pair $[eqn]$ is known from sounding the sub-sampled codebooks (i.e., label). The general formulation of our loss function $[eqn]$ describes the distance between the true value $[eqn]$ and the predicted value $[eqn]$ , which corresponds to the $[eqn]$ output/prediction: $[eqn]$ . The Empirical Risk (also known as training error) is defined as the average across all the individual loss function $[eqn]$ . We define the regularized Empirical Risk function as the above empirical risk in addition to the following regularization terms:

[eqn]

where $[eqn]$ is the set of regularization hyperparameters used to balance the $[eqn]$ model, preventing any overfitting or underfitting. The Empirical Risk Minimization corresponding to the $[eqn]$ model is given by

[eqn]

For the Matrix Factorization variant $[eqn]$ , the optimization problem is given by

[eqn]

where $[eqn]$ denotes the optimal latent vectors for MF and NMF. The test loss (also knows as test error) is given by applying the general loss on the unknown data samples (non-sounded beams) using optimal $[eqn]$ parameters $[eqn]$ and $[eqn]$ : $[eqn]$ , where $[eqn]$ is the test set of our learning model.

4.2. Solutions for MF

We resolve the $[eqn]$ problem $[eqn]$ using the following methods: (i) Block Coordinate Descent (BCD) often denoted as Alternating Least Squares (ALSs), (ii) BCD with Stochastic Gradient Descent, and (iii) Block Gradient Descent (BGD), which merges BCD and Gradient Descent (GD) definitions.

BCD for MF (BCD MF): BCD proceeds by splitting the optimizing problem $[eqn]$ into sub-problems, supposing that all other blocks are known/fixed. We will show that each sub-problem is strongly convex in each block, and the BCD algorithm converges to a stationary point. The application of BCD to the $[eqn]$ problem results in two sub-problems, S1 and S2, which are solved iteratively. At iteration k, the sub-problem $[eqn]$ is defined by fixing the block $[eqn]$ and the update/solve block $[eqn]$ only, as follows:

[eqn]

Moreover, the sub-problem $[eqn]$ is defined by fixing the block $[eqn]$ in $[eqn]$ and the update/solve block $[eqn]$ , only, as follows:

[eqn]

We will rewrite $[eqn]$ into as series of equivalent problems as follows:

[eqn]

where $[eqn]$ is the set of row indexes u in the RSE matrix corresponding to the column i in the known entries of the RSE matrix, $[eqn]$ and $[eqn]$ . We derive the closed-form solution for the sub-problem S1 by finding the global min of $[eqn]$ , as follows:

[eqn]

Similarly, we rewrite the sub-problem (S2) into the following series of equivalent problems by stating the last one:

[eqn]

where $[eqn]$ is the set of column indexes i in the RSE matrix corresponding to the row u in the known entries of the RSE matrix, $[eqn]$ and $[eqn]$ . Next, we derive a closed-form solution for the sub-problem S2 by finding the global min of $[eqn]$ , as follows:

[eqn]

Thus, BCD updates to solve MF are given as follows:

[eqn]

[eqn]

where ^(k)^ represents the index of the BCD iterations, (u,i) are the codebook indexes at $[eqn]$ and $[eqn]$ , and $[eqn]$ denotes the $[eqn]$ of the (u,i) beam couple. The solution $[eqn]$ is reached after the interval/gap between consecutive iterations reaches a predefined $[eqn]$ or a max number of iterations, $[eqn]$ . We have the following result.

Corollary 1. The sequence of updates $[eqn]$ generated by BCD, in (8), is non-increasing (in k) and converges to a stationary point as $[eqn]$ .

Proof. See Appendix A. □

Block Stochastic Gradient Descent (BSGD) for MF (SGD MF): SGD MF proceeds by taking T plain SGD steps (mini-batch size $[eqn]$ ). BGD proceeds by taking T SGD steps for each block BCD. We first choose at random a single training sample $[eqn]$ . The BSGD update for the sub-problem (S1) is done by performing SGD for $[eqn]$ , i.e., choosing at random a single index $[eqn]$ and computing the plain SGD $[eqn]$ , where u is a random index from $[eqn]$ , and $[eqn]$ is the plain SGD on $[eqn]$ . The corresponding update is given as

[eqn]

where u is a single index chosen at random from $[eqn]$ , $[eqn]$ , $[eqn]$ , ^(k)^ is the iteration index for SGD, and $[eqn]$ is the plain SGD over one random sample $[eqn]$ . Similarly, the update for the sub-problem (S2) is done by taking T plain SGD steps of $[eqn]$ , i.e., the SGD, $[eqn]$ , where i is single random index from $[eqn]$ . Thus, the SGD MF update for the sub-problem (S2) is expressed as

[eqn]

where i is a single index chosen randomly from $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ is the plain SGD gradient computed with one sample $[eqn]$ , chosen at random. We write the SGD MF updates as

[eqn]

[eqn]

where u is a random index chosen from $[eqn]$ , and i a random index from $[eqn]$ . $[eqn]$ is the step size for SGD.

BGD for MF (BGD MF): Rather than having a closed-form solution for each optimization block, BGD proceeds by taking T gradient steps for each block T gradient step. We skip the details here for space limitations. Thus, the BGD updates for the $[eqn]$ problem are expressed as

[eqn]

[eqn]

where (u,i) are the codebook indexes at $[eqn]$ and $[eqn]$ , k is the GD iteration index, and $[eqn]$ is the BGD step size ( $[eqn]$ ).

4.3. Solutions for NMF

Our proposed $[eqn]$ follows the exact steps as in $[eqn]$ , with the main difference of constraining the latent vectors being non-negative $[eqn]$ . Likewise, we solve the $[eqn]$ problem, $[eqn]$ , using BCD, SGD, and BGD.

BCD for NMF (BCD NMF): The derivations of BCD for $[eqn]$ (11) are identical to those of BCD for $[eqn]$ (8), followed by the corresponding projection operation. The updates of BCD for $[eqn]$ derivations are given by

[eqn]

[eqn]

where ^(k)^ is the BCD iteration index, and $[eqn]$ is applied element by element on $[eqn]$ , i.e., a Euclidean projection of $[eqn]$ on $[eqn]$ . Since the projection is Euclidean (non-expansive operator), the corollary stated in the previous subsection applies to the BCD for $[eqn]$ too.

Block Stochastic Gradient Descent (BSGD) for NMF (SGD NMF): The SGD NMF derivations are exactly the same as that of SGD MF, followed by a projection $[eqn]$ . We thus express the SGD NMF updates as

[eqn]

[eqn]

where u is a random index chosen from $[eqn]$ , i is a random index from $[eqn]$ , $[eqn]$ , and $[eqn]$ is the SGD step size ( $[eqn]$ ).

BGD for NMF (BGD NMF): The solution and derivations for BGD NMF are the same as those for BGD MF, followed by a projection $[eqn]$ , i.e,

[eqn]

[eqn]

where $[eqn]$ , ^(k)^ is the GD iteration index and $[eqn]$ is the GD step size ( $[eqn]$ ). We use a constant step size $[eqn]$ for all these methods.

4.4. Prediction for MF and NMF

For both $[eqn]$ and $[eqn]$ , the predicted $[eqn]$ of the beam-pair $[eqn]$ , for beam indexes that were not sounded, is expressed as

[eqn]

where $[eqn]$ is the test set and $[eqn]$ are optimal solutions to MF (or NMF). Afterwards, we search for the optimal beam pair at $[eqn]$ and $[eqn]$ as the one with the highest $[eqn]$ value over both training and test sets, as follows:

[eqn]

4.5. Proposed BA Algorithm Using MF/NMF

Due to the fact that the updates given in a closed-form solution, we can quantify the computational complexity of all of the above methods. As seen from the updates for BCD MF and BCD NMF, we have to invert two $[eqn]$ matrices (for the sum problems S1 and S2). Thus, the (per-iteration) computational complexity of BCD MF and BCD NMF is approximated as $[eqn]$ . Moreover, for BGD MF and BGD NMF, one has to compute two full-batch gradients over all training samples in $[eqn]$ (for the sub-problems S1 and S2). Consequently, the complexity, per-iteration, for BGD MF and BGD NMF is approximated as $[eqn]$ . Finally, for SGD MF and SGD NMF, since we use a mini-batch size $[eqn]$ (for the sub-problems S1 and S2), the resulting per-iteration computational complexity is approximated as $[eqn]$ . Solving the $[eqn]$ and $[eqn]$ problem, we employ methods such as BCD, BGD, or SGD. All details are shown in Algorithm 1. Algorithm 1 Proposed MF/NMF-Based BA Method.

Input: $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$
-Generate randomly sub-sampled codebooks, $[eqn]$ , satisfying $[eqn]$
-Sound beam pairs from training set, $[eqn]$ .
-Record corresponding $[eqn]$ in and generate mat. $[eqn]$ , in (5)
-Select model: MF or NMF
-IF MF model selected
solve $[eqn]$ with BCD for MF, in (8) or solve $[eqn]$ with BGD for MF, in (10) or solve $[eqn]$ with SGD for MF, in (9). At the end of training, return optimal latent vectors, $[eqn]$
-IF NMF model selected
solve $[eqn]$ with BCD for NMF, in (11) or solve $[eqn]$ with BGD for NMF, in (13) or solve $[eqn]$ with SGD for NMF, in (12). At the end of training, return ideal latent vectors, $[eqn]$
-Use ideal latent vectors $[eqn]$ , to predict unknown $[eqn]$ of test set, $[eqn]$ , in (14)
-Search training and test sets, for beam pair w/ largest $[eqn]$ , (15)
Output: $[eqn]$ , $[eqn]$

While, for MF BCD and NMF BCD, the only hyperparameter is the model size D, MF BGD and NMF BGD require, in addition to D, $[eqn]$ , the GD step size as hyperparameters.

4.6. Numerical Simulations

This section illustrates our numerical setup. The number of antennas at $[eqn]$ and $[eqn]$ , 256, 512, $[eqn]$ . We set up $[eqn]$ and $[eqn]$ . The overhead ratio regime $[eqn]$ 0.7 $[eqn]$ 0.5 $[eqn]$ 0.3 $[eqn]$ 0.1}. The number of $[eqn]$ sub-carriers $[eqn]$ and the number of channel paths L is 2. We vary the transmitted power, $[eqn]$ . We use $[eqn]$ codebooks at $[eqn]$ and $[eqn]$ . The optimal hyperparameters are chosen to minimize test loss. The model dimension $[eqn]$ , the learning rate $[eqn]$ , and the regularization factors $[eqn]$ . For each MIMO configuration and for each $[eqn]$ regime, we randomly generate and store the resulting $[eqn]$ matrices.

We propose to investigating six models in total (BCD MF, BCD NMF, BGD MF, BGD NMF, SGD MF, SGD NMF) with respect to three transmitted power regimes: high $[eqn]$ , medium $[eqn]$ W, and low $[eqn]$ W with fixed $[eqn]$ . In Table 1, we provide a summary for all proposed system parameters. We use the training Normalized MSE ( $[eqn]$ ) to evaluate the training error, expressed as $[eqn]$ . We also define $[eqn]$ . The range of training error and the overall behavior of $[eqn]$ -based models are different and distinctive from $[eqn]$ models in both $[eqn]$ and $[eqn]$ ; for instance, $[eqn]$ -based models’ error range are around $[eqn]$ , while $[eqn]$ -based models are around $[eqn]$ . Thus, $[eqn]$ is more accurate. However, $[eqn]$ converges faster and the cost function drops to low values from the very first iterations. In addition, for $[eqn]$ and $[eqn]$ , the train $[eqn]$ decreases with the increase in the overhead ratio $[eqn]$ , as seen in Figure 5. Low and medium $[eqn]$ regimes are characterized by noisy links between $[eqn]$ and $[eqn]$ and represent a more challenging experimental environment. $[eqn]$ -based models tend to be faster in reaching low error values, while $[eqn]$ -based models are more accurate. (For instance, $[eqn]$ generally ameliorates the quality of prediction compared with $[eqn]$ ).

Regarding $[eqn]$ simulation figures, Figure 5a states the decrease of train/test $[eqn]$ in function of the overhead ratio (more training samples result in fewer errors); Figure 5b,c track the instant drop in loss values from the very first iterations for $[eqn]$ -based models; and Figure 5d,e present the progressive convergence of cost function among the iterations when we use $[eqn]$ -based models. In summary, Table 2 outlines the optimal (minimum) signaling overhead ratio required for the all proposed system configurations, the optimal model (holding the smallest total cost function), the related combination of optimal hyperparameters, and the corresponding train/test error values. When the signal is affected with much noise, it is harder to keep the same range of error when compared with high a $[eqn]$ regime. In fact, $[eqn]$ models keep the same (minimum) signaling overhead ( $[eqn]$ ) regardless of the transmitted power regime, being able to accurately predict with just $[eqn]$ of sounded beams. Thus, the proposed $[eqn]$ methods are able to reduce the pilot signaling overhead by $[eqn]$ compared with Exhaustive $[eqn]$ with negligible training and test errors.

5. Multi-Layer Perceptron

5.1. MLP Problem Formulation

We consider a feed-forward $[eqn]$ , with J layers, modeled as a composition of J non-linear functions/layers. Let $[eqn]$ be the $[eqn]$ input, and $[eqn]$ be the $[eqn]$ output; see Figure 6. We denote with $[eqn]$ all the hidden layers. We assume for simplicity that the width of all the layers is the same, denoted as D, i.e., $[eqn]$ ; see Figure 6. The equation describing layer 1 is given by $[eqn]$ , where $[eqn]$ is the output of layer 1, $[eqn]$ is the resulting weight vector, and $[eqn]$ is the non-linear activation function for layer 1. We use one hot encoding for the MLP input $[eqn]$ , i.e., $[eqn]$ for all training samples, $[eqn]$ . We express the output of the hidden layers, $[eqn]$ , as $[eqn]$ , where $[eqn]$ is the input of the layer j and $[eqn]$ is its output $[eqn]$ ; $[eqn]$ is the weight matrix for the layer j $[eqn]$ ; and $[eqn]$ is the element-by-element non-linear activation function for the layer j, $[eqn]$ . Finally, the relation for the last layer $[eqn]$ is expressed as $[eqn]$ , where $[eqn]$ is the output for layer J, $[eqn]$ is its weight vector, and $[eqn]$ is the non-linear activation function for the layer J. We express the output of the MLP $[eqn]$ (as a function of all layers) as

[eqn]

The output of $[eqn]$ is made to fit/approximate all the $[eqn]$ values at all training samples; $[eqn]$ , $[eqn]$ . We define the MSE loss $[eqn]$ for the sample $[eqn]$ in the training set $[eqn]$ as the distance between the MLP output $[eqn]$ and the known RSE label for the beam pair $[eqn]$ , $[eqn]$ , i.e,

[eqn]

Then, the empirical risk is defined as the average of the individual loss $[eqn]$ across the training set $[eqn]$ , $[eqn]$ . The empirical risk minimization for the MLP is given in $[eqn]$ .

[eqn]

5.2. MLP Learning

We propose to learn the optimal $[eqn]$ weights via back-propagation (BP). We choose an arbitrary mini-batch of samples of size $[eqn]$ and define the mini-batch loss as

[eqn]

We express the partial derivative of the loss corresponding to the mini-batch $[eqn]$ with respect to each layer $[eqn]$ as

[eqn]

where

[eqn]

$[eqn]$ and ∘ denotes the Hadamard product. We express the BP weight update of the mini-batch loss $[eqn]$ , for all layers $[eqn]$ , as

[eqn]

where ^(k)^ is the BP iteration index, $[eqn]$ is the value of $[eqn]$ at iteration k, $[eqn]$ is the BP step size (learning rate) for the layer j at iteration k, and $[eqn]$ is the partial derivative given in (18) evaluated at $[eqn]$ .

Back-propagation algorithm with mini-batch

Choose the mini-batch $[eqn]$ as a random subset of the training set $[eqn]$ .

Compute the loss function $[eqn]$ for all samples in the mini-batch $[eqn]$ in (17).
Compute the partial derivative $[eqn]$ of the mini-batch loss $[eqn]$ with respect to $[eqn]$ in (18).
Update the weights of each layer as in (19).

We assume that the BP learning rate is the same for all layers, $[eqn]$ .

5.3. Prediction Using MLP

The $[eqn]$ prediction for the sample (u,i) in the test set $[eqn]$ , using optimal weights $[eqn]$ , $[eqn]$ is as follows:

[eqn]

Therefore, the test $[eqn]$ is defined as

[eqn]

We then select the optimal indexes $[eqn]$ and $[eqn]$ related to the highest $[eqn]$ value, as follows:

[eqn]

5.4. Proposed BA Algorithm Using MLP

The Multi-Layer Perceptron-based Beam Alignment is specified in Algorithm 2. Algorithm 2 Proposed MLP-Based BA Method.

Input: $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$
-Generate randomly sub-sampled codebooks, $[eqn]$ , satisfying $[eqn]$
-Sound beam pairs from training set, $[eqn]$ .
-Record corresponding $[eqn]$ and generate $[eqn]$ mat. $[eqn]$ , in (5)
-Train $[eqn]$ weights (using back-propagation algorithm)
return optimal weights, $[eqn]$
-Use optimal parameters $[eqn]$ , to predict unknown $[eqn]$ of test set, $[eqn]$ , in (21)
-Search training and test sets, for optimal beam pair $[eqn]$ , holding the largest $[eqn]$ , (22)
Output: $[eqn]$ , $[eqn]$

We assume that the number of neurons per layer D, the number of layers J, the mini-batch size $[eqn]$ , and the BP learning rate $[eqn]$ are hyperparameters. They are tuned using a grid search cross-validation.

5.5. Numerical Simulations

We define the training and test cost functions as follows:

[eqn]

[eqn]

Therefore, we used the same system configurations as for $[eqn]$ , resumed in Table 1. Moreover, we choose the learning rate $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$ , while the batch size $[eqn]$ , 4, 8, 16, 32, 64, $[eqn]$ , the number of hidden layers $[eqn]$ . For each layer, the number of neurons $[eqn]$ , 16, 32, 64, $[eqn]$ . We use the Rectified Linear Units as our activation function for all layers.

Similar to $[eqn]$ , train performance is observed when we track the evolution of the cost function $[eqn]$ , applied to the training samples of the set $[eqn]$ , in a function of iterations. The range of considerably low-error values and the overall learning behavior of the $[eqn]$ architecture illustrates that our shallow neural network successfully resolves the non-linear regression problems related to our BA process. For massive setups, $[eqn]$ reaches around $[eqn]$ error in a high $[eqn]$ regime. However, this cost value increases as long as the amount of noise and interference augments. Note that the train $[eqn]$ also decreases when we increase the size of the dataset matrix $[eqn]$ , which provides more samples for $[eqn]$ to improve the feature extraction and the prediction quality. Regarding the unknown beams, test error values in the numerical result tables are close to the train cost (with no overfitting or underfitting in the corresponding learning curves). Moreover, the test loss is impacted by the transmitted power regime the same way as the training process. Identical to $[eqn]$ -based $[eqn]$ , the $[eqn]$ learning curves in Figure 7 plot the same shape of curve with a continuous monotonic decrease in the train and test cost among the iterations: the convergence is progressive among the iterations, and at the last epoch, training and test $[eqn]$ values land at considerably low error values and prove that $[eqn]$ accurately fits to our problem and provides a concrete solution for $[eqn]$ -based $[eqn]$ . From a $[eqn]$ perspective, Table 3 resumes the smallest (optimal) signaling overhead required for a successful beam sounding based on reliable prediction quality. Similar to $[eqn]$ , for all the proposed transmitted power, $[eqn]$ requires $[eqn]$ of the total beam pairs to fulfill the $[eqn]$ matrix.

6. Results and Discussion

6.1. Train/Test Prediction Performance Comparison

For the six $[eqn]$ -based models, we select the best one (minimum test error) to represent the $[eqn]$ family of methods in this section and compare it with $[eqn]$ . When we analyze $[eqn]$ (Table 1 and Table 2), we notice that the transmitted power regime impacts the quality of prediction by reducing the overall loss. For $[eqn]$ , we observe that the loss damage is large. We jump from around $[eqn]$ for massive configurations (256, 512, and 1024) to $[eqn]$ for smaller setups. For $[eqn]$ , we spot the increase in the overall loss when we decrease $[eqn]$ . Thus, $[eqn]$ seems to be the most robust architecture with respect to changing the transmitted power. Additionally, we empirically notice that the change in the $[eqn]$ values does not impact the optimal hyperparameters selected from cross-validation. Furthermore, when we track the evolution of the training/test cost in the function of iterations, we observe balanced models with no signs of overfitting or underfitting. On the other hand, when the transmitted power decreases, $[eqn]$ tend to be the most impacted models in terms of train/test error, while the $[eqn]$ error is robust.

On the other hand, from a $[eqn]$ perspective, concerning the evolution of the optimal (minimum) required signaling overhead and what impact can the $[eqn]$ regime have on the optimal required values, in reference to Table 1 and Table 2, all the proposed models required just $[eqn]$ of the total number of beam pairs at $[eqn]$ and $[eqn]$ for all antenna configurations from $[eqn]$ to $[eqn]$ for all the proposed $[eqn]$ values. This proves that the transmitted power impacts the quality of prediction but not the number of beam pairs required for training. In fact, low $[eqn]$ leads to damaging the signal quality and subsequently damages the quantity of useful information to be extracted from the datasets. Finally, the only cases where the $[eqn]$ regime impacts the optimal overhead ratio is among the smallest configurations, for instance, the $[eqn]$ setup where it seems normal for all learning models to demand more data to learn from (more hidden interactions between $[eqn]$ and $[eqn]$ as features to extract). These are the experimental situations where Exhaustive $[eqn]$ is technically feasible.

6.2. Similarities and Differences between Models

All models required just $[eqn]$ of the beams for training for all the proposed massive setups. Moreover, all the proposed models are shallow neural architectures with few hidden layers for low-complexity constraints. Even among the largest configurations, the optimal dimensions of models picked from the cross-validation illustrate small networks with no need to require dense architectures. Furthermore, all models succeeded with the matrix completion task, and they all illustrate a monotonic decrease in loss values as long as we increase the MIMO setup. Additionally, $[eqn]$ -based models are the most accurate reaching loss values in the range $[eqn]$ for massive setups in a high $[eqn]$ regime, and their cross-validation illustrates smaller grid search where there are fewer hyperparameters to tune. However, they are the slowest models when applied to high-dimensional MIMO setups. On the other hand, $[eqn]$ illustrates a good balance between run time (complexity) and loss values (prediction quality). It reaches around $[eqn]$ and $[eqn]$ loss for massive configurations. In addition, the $[eqn]$ is the most robust model facing the changes in the $[eqn]$ values. In Figure 8, for $[eqn]$ , the figure illustrates the train/test $[eqn]$ in the function of each model and the corresponding transmitted power: in Figure 8a, for $[eqn]$ , $[eqn]$ achieves its best performance, slightly better than $[eqn]$ with the difference between achieved cost values at around $[eqn]$ . In Figure 8b, when $[eqn]$ , $[eqn]$ still gets the best performance, marginally better than $[eqn]$ with an $[eqn]$ value difference of around $[eqn]$ . In Figure 8c, when $[eqn]$ , $[eqn]$ noticeably gets impacted (overall loss around $[eqn]$ ) while $[eqn]$ provides the best prediction performance: this suggests that when $[eqn]$ is small, $[eqn]$ is more robust than $[eqn]$ , which performs best in high $[eqn]$ regime. Similarly, almost same remarks hold for Figure 9 when we simulate the $[eqn]$ configuration: in Figure 9a, $[eqn]$ reaches considerably better performance compared with $[eqn]$ with $[eqn]$ . In Figure 9b, $[eqn]$ kept the same range of error, which states again the robustness of the model while $[eqn]$ got severely impacted ( $[eqn]$ ) but sill holds the best performance. In Figure 9c, when $[eqn]$ is weak, $[eqn]$ illustrates the worst performance in all simulations. On the other hand, $[eqn]$ got slightly impacted with an overall loss of $[eqn]$ and reaches the best quality of prediction. In Figure 10, we investigate the highest configuration $[eqn]$ . Similar conclusions for Figure 8 and Figure 9 hold for this figure in terms of best model ( $[eqn]$ for $[eqn]$ , $[eqn]$ and $[eqn]$ for $[eqn]$ ). In addition, we aim to investigate the overall impact of varying the transmitted power. Thus, we track the $[eqn]$ values while switching from one $[eqn]$ regime to another: In Figure 10, in Figure 10a, for $[eqn]$ , the curve gap from low/medium is $[eqn]$ . The gap in the medium/high regimes is almost negligible ( $[eqn]$ ). Finally, in Figure 10b, the $[eqn]$ gap is around $[eqn]$ and $[eqn]$ : at each change of $[eqn]$ , $[eqn]$ is considerably impacted. To sum up, the choice of the optimal model strongly depends on the available complexity and the given transmitted power $[eqn]$ . In fact, $[eqn]$ , whether through $[eqn]$ or $[eqn]$ optimization, is the best model when the transmitted power is high ( $[eqn]$ ). In this case, $[eqn]$ converges faster but has higher complexity than $[eqn]$ . However, $[eqn]$ for $[eqn]$ are the slowest models to converge but show negligible complexity. On the other hand, if we aim to prioritize run time, $[eqn]$ exhibits the fastest predictions with good prediction error. Finally, it is wise to opt for $[eqn]$ if the system is to operate under various transmitted power regimes where $[eqn]$ offers good prediction quality for every $[eqn]$ value and the available complexity is medium.

7. Conclusions

In this paper, we proposed a blind Machine Learning-based Beam Alignment using Matrix Factorization, non-negative Matrix Factorization, and Multi-Layer Perceptron. We assumed an Uplink massive mmWave MIMO system using single RF-chains at $[eqn]$ and multiple RF-chains at $[eqn]$ though a fully analog architecture. The proposed approach consists in sounding the $[eqn]$ of sub-sampled codebooks at $[eqn]$ and $[eqn]$ . The $[eqn]$ of the non-sounded beams is predicted using $[eqn]$ , $[eqn]$ , and $[eqn]$ models. Our results show that, by sounding just $[eqn]$ of the total beam pair samples, we may predict with high accuracy the unknown $[eqn]$ values, which massively reduce the large signaling overhead of Exhaustive $[eqn]$ . Our future work investigates the scalability of our approach to a multi-user scenario. Robustness and $[eqn]$ -interpretability are other research directions for modeling industrial deployment.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Wang Y. Wei Z. Feng Z. Beam Training and Tracking in Mm Wave Communication: A Surveyar Xiv 20222205.10169
2IEEE Std 802.15.3c-2009 IEEE Standard for Information Technology—Local and Metropolitan Area Networks—Specific requirements—Part 15.3: Amendment 2: Millimeter-Wave-Based Alternative Physical Layer Extension IEEE Piscataway, NJ, USA 2009
3IEEE Std 802.11ad-2012 IEEE Standard for Information technology—Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks—Specific Requirements-Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 3: Enhancements for Very High Throughput in the 60 G Hz Band IEEE Piscataway, NJ, USA 2012
43GPP. TS 38.211 V 16.7.1 NR; Physical Channels and Modulation; ETSI Technical Specification 138 211 V 16.10.0; Released: 07/2022 Available online: https://www.etsi.org/deliver/etsi_ts/138200_138299/138211/16.10.00_60/ts_138211 v 161000 p.pdf(accessed on 9 July 2024)
5Noh S. Zoltowski M.D. Love D.J. Multi-Resolution Codebook and Adaptive Beamforming Sequence Design for Millimeter Wave Beam Alignment IEEE Trans. Wirel. Commun.2017165689570110.1109/TWC.2017.2713357 · doi ↗
6Kokshoorn M. Chen H. Wang P. Li Y. Vucetic B. Millimeter Wave MIMO Channel Estimation Using Overlapped Beam Patterns and Rate Adaptation IEEE Trans. Signal Process.20166560161610.1109/TSP.2016.2614488 · doi ↗
7Tsang Y.M. Poon A.S.Y. Addepalli S. Coding the Beams: Improving Beamforming Training in mm Wave Communication System Proceedings of the 2011 IEEE Global Telecommunications Conference—GLOBECOM 2011 Houston, TX, USA 5–9 December 20111610.1109/GLOCOM.2011.6134486 · doi ↗
8Buzzi S. D’Andrea C. Subspace Tracking and Least Squares Approaches to Channel Estimation in Millimeter Wave Multiuser MIMOIEEE Trans. Commun.2019676766678010.1109/TCOMM.2019.2924885 · doi ↗