Source Coding Based Millimeter-Wave Channel Estimation with Deep   Learning Based Decoding

Yahia Shabara; Eylem Ekici; C. Emre Koksal

arXiv:1905.00124·cs.IT·April 12, 2021

Source Coding Based Millimeter-Wave Channel Estimation with Deep Learning Based Decoding

Yahia Shabara, Eylem Ekici, C. Emre Koksal

PDF

1 Repo

TL;DR

This paper introduces a deep learning-based decoding method for mmWave channel estimation that reduces measurement overhead and outperforms traditional compressed sensing techniques.

Contribution

It proposes framing mmWave channel estimation as a source compression problem and employs deep learning for decoding, achieving lower measurement requirements.

Findings

01

Outperforms state-of-the-art compressed sensing methods

02

Determines the lower bound on measurements for reliable estimation

03

Reduces measurement overhead significantly

Abstract

The speed at which millimeter-Wave (mmWave) channel estimation can be carried out is critical for the adoption of mmWave technologies. This is particularly crucial because mmWave transceivers are equipped with large antenna arrays to combat severe path losses, which consequently creates large channel matrices, whose estimation may incur significant overhead. This paper focuses on the mmWave channel estimation problem. Our objective is to reduce the number of measurements required to reliably estimate the channel. Specifically, channel estimation is posed as a "source compression" problem in which measurements mimic an encoded (compressed) version of the channel. Decoding the observed measurements, a task which is traditionally computationally intensive, is performed using a deep-learning-based approach, facilitating a high-performance channel discovery. Our solution not only outperforms…

Tables1

Table 1. TABLE I : Channel measurements 𝒚 𝒔 superscript 𝒚 𝒔 \boldsymbol{y^{s}} corresponding to all 𝒒 𝒂 ∈ 𝒬 a superscript 𝒒 𝒂 superscript 𝒬 𝑎 \boldsymbol{q^{a}}{\in}\mathcal{Q}^{a}

Angular Channel $𝒒^{𝒂}^{T}$		Channel measurement $𝒚^{𝒔}^{T}$
$d_{0}$	$[0 0 0 0 0 0 0 0]$	$[0 0 0 0]$
$d_{1}$	$[1 0 0 0 0 0 0 0]$	$[1 0 0 0]$
$d_{2}$	$[0 1 0 0 0 0 0 0]$	$[0 1 0 0]$
$d_{3}$	$[0 0 1 0 0 0 0 0]$	$[0 0 1 0]$
⋮	⋮	⋮
$d_{8}$	$[0 0 0 0 0 0 0 1]$	$[1 1 0 1]$

Equations72

G = 10000100001000011100011000111101

G = 10000100001000011100011000111101

y_{i}^{s} = j = 1 \sum 8 q_{j}^{a} \times g_{i, j} = j : g_{i, j} = 1 \sum q_{j}^{a} .

y_{i}^{s} = j = 1 \sum 8 q_{j}^{a} \times g_{i, j} = j : g_{i, j} = 1 \sum q_{j}^{a} .

H_{2} (q^{a}) = q^{a} \in Q^{a} \sum P (q^{a}) lo g_{2} (\frac{1}{P ( q ^{a} )})

H_{2} (q^{a}) = q^{a} \in Q^{a} \sum P (q^{a}) lo g_{2} (\frac{1}{P ( q ^{a} )})

⌈ lo g_{2} (∣ Q^{a} ∣) ⌉ \geq H_{2} (q^{a})

⌈ lo g_{2} (∣ Q^{a} ∣) ⌉ \geq H_{2} (q^{a})

α_{l}^{b} = α_{l} n_{r} n_{t} e^{- j \frac{2 π ρ _{l}}{λ _{c}}},

α_{l}^{b} = α_{l} n_{r} n_{t} e^{- j \frac{2 π ρ _{l}}{λ _{c}}},

e_{t} (Ω) = \frac{1}{n _{t}} (1, e^{- j 2 π Δ_{t} Ω}, e^{- j 2 π 2 Δ_{t} Ω}, \dots, e^{- j 2 π (n_{t} - 1) Δ_{t} Ω})^{T}

e_{t} (Ω) = \frac{1}{n _{t}} (1, e^{- j 2 π Δ_{t} Ω}, e^{- j 2 π 2 Δ_{t} Ω}, \dots, e^{- j 2 π (n_{t} - 1) Δ_{t} Ω})^{T}

Q = l = 1 \sum L α_{l}^{b} e_{r} (Ω_{r l}) e_{t}^{H} (Ω_{tl}) .

Q = l = 1 \sum L α_{l}^{b} e_{r} (Ω_{r l}) e_{t}^{H} (Ω_{tl}) .

Q^{a} = U_{r}^{H} Q U_{t} .

Q^{a} = U_{r}^{H} Q U_{t} .

U_{t} ≜ (e_{t} (0) e_{t} (\frac{1}{L _{t}}) \dots e_{t} (\frac{n _{t} - 1}{L _{t}})),

U_{t} ≜ (e_{t} (0) e_{t} (\frac{1}{L _{t}}) \dots e_{t} (\frac{n _{t} - 1}{L _{t}})),

u_{b} = Q x + n

u_{b} = Q x + n

SNR ≜ \frac{P}{N _{0}} \times μ,

SNR ≜ \frac{P}{N _{0}} \times μ,

u_{i, j} = w_{i}^{H} Q f_{j} s + w_{i}^{H} n

u_{i, j} = w_{i}^{H} Q f_{j} s + w_{i}^{H} n

u_{i, j}^{s} = [w_{i}^{H} Q f_{j} s + w_{i}^{H} n]_{+},

u_{i, j}^{s} = [w_{i}^{H} Q f_{j} s + w_{i}^{H} n]_{+},

c_{s} = G s,

c_{s} = G s,

y_{1}^{s} - y_{2}^{s}

y_{1}^{s} - y_{2}^{s}

= i = 1 \sum n_{r} v_{i} \times g_{i} = i \in X_{v} \sum v_{i} \times g_{i}

X_{q_{1}^{a}}^{c} \cap X_{q_{2}^{a}}^{c} = {X_{q_{1}^{a}} \cup X_{q_{2}^{a}}}^{c} \subseteq X_{v}^{c}

X_{q_{1}^{a}}^{c} \cap X_{q_{2}^{a}}^{c} = {X_{q_{1}^{a}} \cup X_{q_{2}^{a}}}^{c} \subseteq X_{v}^{c}

g_{i_{0}} = j : j \neq = i_{0} g_{j} \in G_{D} \sum g_{j} mod 2.

g_{i_{0}} = j : j \neq = i_{0} g_{j} \in G_{D} \sum g_{j} mod 2.

j : g_{j} \in G_{D} \sum g_{j} mod 2 = 0

j : g_{j} \in G_{D} \sum g_{j} mod 2 = 0

G v_{s} mod 2

G v_{s} mod 2

= j : g_{j} \in G_{D} \sum g_{j} mod 2 \neq = 0

\xLongleftrightarrow iff

w_{i}^{H} q = y_{i}^{s} = g_{i} q^{a} = j = 1 \sum n_{r} g_{i, j} q_{j}^{a}

w_{i}^{H} q = y_{i}^{s} = g_{i} q^{a} = j = 1 \sum n_{r} g_{i, j} q_{j}^{a}

w_{i}^{H} (e_{r} (0) e_{r} (\frac{1}{L _{r}}) \dots e_{r} (\frac{n _{r} - 1}{L _{r}})) = g_{i} .

w_{i}^{H} (e_{r} (0) e_{r} (\frac{1}{L _{r}}) \dots e_{r} (\frac{n _{r} - 1}{L _{r}})) = g_{i} .

w_{i} = j : g_{i, j} = 1 \sum e_{r} (\frac{j - 1}{L _{r}})

w_{i} = j : g_{i, j} = 1 \sum e_{r} (\frac{j - 1}{L _{r}})

\underaccent{\bar}{m}\geq\left\lceil\log_{2}\left(\sum_{i=0}^{L}{n_{r}\choose i}\right)\right\rceil\geq H_{2}\left(\boldsymbol{q}^{a}_{s}\right)

\underaccent{\bar}{m}\geq\left\lceil\log_{2}\left(\sum_{i=0}^{L}{n_{r}\choose i}\right)\right\rceil\geq H_{2}\left(\boldsymbol{q}^{a}_{s}\right)

P (q_{s}^{a}) sup q^{a} \in Q^{a} \sum P (q_{s}^{a}) lo g_{2} (\frac{1}{P ( q _{s}^{a} )})

P (q_{s}^{a}) sup q^{a} \in Q^{a} \sum P (q_{s}^{a}) lo g_{2} (\frac{1}{P ( q _{s}^{a} )})

u^{s} = y^{s} + z,

u^{s} = y^{s} + z,

∥ \hat{q}^{a} - q^{a} ∥_{2} \leq \frac{1}{σ _{min} ( G )} ∥ z ∥_{2}

∥ \hat{q}^{a} - q^{a} ∥_{2} \leq \frac{1}{σ _{min} ( G )} ∥ z ∥_{2}

⟹ ∥ z ∥_{2}

⟹ ∥ z ∥_{2}

= ∥ (\hat{q}^{a} - q^{a}) ∥_{2} \frac{∥ G ( q ^ ^{a} - q ^{a} ) ∥ _{2}}{∥ ( q ^ ^{a} - q ^{a} ) ∥ _{2}}

\geq ∥ (\hat{q}^{a} - q^{a}) ∥_{2} σ_{min} (G)

∥ \hat{q}^{a} - q^{a} ∥_{2} \leq \frac{1}{σ _{min} ( G )} ∥ z ∥_{2} \qed

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yahiaShabara/beamDiscoveryPublic
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

Full text

Source Coding Based Millimeter-Wave Channel Estimation with Deep Learning Based Decoding

Yahia Shabara Student Member, IEEE, Eylem Ekici, Fellow, IEEE, and C. Emre Koksal, Senior Member, IEEE

Abstract

The speed at which millimeter-Wave (mmWave) channel estimation can be carried out is critical for the adoption of mmWave technologies. This is particularly crucial because mmWave transceivers are equipped with large antenna arrays to combat severe path losses, which consequently creates large channel matrices, whose estimation may incur significant overhead. This paper focuses on the mmWave channel estimation problem. Our objective is to reduce the number of measurements required to reliably estimate the channel. Specifically, channel estimation is posed as a “source compression” problem in which measurements mimic an encoded (compressed) version of the channel. Decoding the observed measurements, a task which is traditionally computationally intensive, is performed using a deep-learning-based approach, facilitating a high-performance channel discovery. Our solution not only outperforms state-of-the-art compressed sensing methods, but it also determines the lower bound on the number of measurements required for reliable channel discovery.

Index Terms:

Millimeter-Wave, Channel Estimation, Path Discovery, Sparse Recovery, Source Coding, Machine Learning.

††This work was supported in part by the NSF CNS under Grant 1514260, Grant 1618566, Grant 1731698 and Grant 1814923.

I Introduction

The Rapid increase in mobile data traffic has motivated the exploration of mmWave spectrum bands [1, 2, 3, 4]. While mmWave communication promises orders of magnitude increase in data rates, it both i) suffers from severe path losses [5] and ii) necessitates the use of power-hungry circuits to operate. To overcome these problems, large-gain, highly-directional antenna arrays are proposed as a counter measure to path losses, along with less flexible, yet energy-efficient transceivers that no longer use fully-digital beamforming. Large antenna arrays, however, create channel matrices with large dimensions, which are complex to estimate. When combined with limited transceiver capabilities, large scale channel estimation may take prohibitively long periods. Reducing the number of measurements is thus a critical step towards facilitating mmWave networks. Fortunately, this does not necessarily degrade the quality of channel estimation due to the sparse nature of mmWave channels; a feature that has been revealed by empirical measurement studies and further adopted by statistical channel models [3, 6, 7].

This work focuses on the problem of mmWave channel estimation with the objective of decreasing the number of required measurements. We treat this problem as that of path/beam discovery which is crucial for initial link establishment between a transmitter (TX) and a receiver (RX) (also known as Initial Access). We solve this problem using a technique inspired by binary source coding (data compression). Although binary codes are natively designed to compress binary data, we provide a foundation for the same codes to be used for compressing complex-valued data, as well. We devise a method to obtain channel measurements such that they resemble a compressed version of the channel matrix. To estimate the channel from the acquired measurements, we train a Deep Neural Network (DNN) that enables very high speed processing. However, training DNNs that jointly process all measurements poses an overwhelming complexity. Thus, we propose a novel computationally-tractable solution that sequentially processes the acquired measurements. This method is unique in the sense that it reduces the problem of estimating the channel matrix as a whole into several smaller sub-problems of estimating the individual rows and columns of that matrix. The key contributions of this work are as follows:

•

We show that lossless, fixed-rate, linear source codes can be used to design efficient channel measurements that can be uniquely mapped to the underlying channels.

•

We accurately evaluate the number of measurements needed for reliable channel discovery (as opposed to a mere scaling law). This number is dependent on the compression ratio of the chosen code.

•

We present a tight lower bound on the number of measurements needed to reliably discover the channel and provide a solution that achieves this bound.

•

We propose a high-performance DNN based measurement-to-channel mapping.

•

We show that our solution outperforms the state-of-the-art compressed sensing based solutions and the IEEE 802.11ad beam alignment method.

The mmWave channel estimation problem can generally be divided into two intertwined parts. The first is: how to obtain “good” measurements that can be used to reliably discover the channel? and the second is: how to map these measurements to corresponding channel estimates? Motivated by our proposed solution, we name these two parts “Channel Encoding” and “Measurement Decoding”, respectively. Encoding and decoding are intertwined because a selection of a specific decoding method often dictates (i) how the measurements are obtained, and (ii) the number of measurements for which this specific decoding method would yield “good” performance. The dissociation of encoding and decoding as two sub-problems can be seen across almost all mmWave channel estimation research, albeit not always explicitly mentioned. This distinction, however, facilitates the identification of key aspects upon which we could improve the quality of channel estimation.

A well-known classification of encoding paradigms is encoding with vs. without feedback. Non-feedback encoding is better suited for simultaneous multi-user channel estimation, hence is scalable, while feedback-based encoding operates better at low SNR [8]. Different decoding algorithms are also needed for these two types. This paper focuses on encoding without feedback.

The state-of-the-art mmWave channel estimation algorithms rely on compressed Sensing (CS) to reduce the number of channel measurements[9, 10, 11, 12]. Other approaches include: i) measurements with hierarchical beam patterns that sequentially narrow down the angular direction(s) which contain strong propagation paths, ii) measurements with overlapped beam patterns where each measurement combines signals received from a randomly selected set of angular directions [13], and iii) machine learning based algorithms for sparse recovery of mmWave channels [14, 15, 16, 17]. Further, in [18] we first introduced the idea of exploiting binary codes for tackling mmWave channel estimation. In particular, we exploit the capability of error discovery of channel codes and construct an analogy to path discovery in mmWave channels.

Notations: $x$ is a scalar quantity, while $\boldsymbol{x}$ is a vector and $\boldsymbol{X}$ is a matrix. The transpose of a matrix is denoted by $\boldsymbol{X}^{T}$ , while $\boldsymbol{X}^{*}$ denotes its conjugate and $\boldsymbol{X}^{H}$ denotes the conjugate transpose.

II Related Work

Initial Access: The “Initial Access” problem is concerned with finding the angular bearings of one or more propagation paths between a pair of TX/RX nodes, without prior knowledge about previous channel values. In mobile environments, these angular directions are expected to change after Initial Access. “Beam Tracking” methods are commonly used to correct for smaller angular changes and maintain the viability of active link(s) [19, 20]. Nonetheless, due to the narrow beams at both TX and RX, established communication links are prone to blockage (by objects in the communication environment, and even the users themselves). Hence, the initial link establishment stage might need to be repeated multiple times during every communication session. This results in high overhead to establish coherent beams during the course of the session, if the initial access process is inefficient. This paper focuses on the Initial Access problem.

Compressed Sensing (CS): In CS theory, the main objective is to recover an unknown sparse vector $\boldsymbol{q^{a}}$ using a small number (compared to the sparse vector dimensions) of linear measurements. Measurements in CS, denoted by $\boldsymbol{y},$ are modeled as $\boldsymbol{y}=\boldsymbol{B}\boldsymbol{q^{a}}$ , where $\boldsymbol{B}$ is the sensing matrix. Hence, $\boldsymbol{B}$ is a linear transformation that amounts to encoding $\boldsymbol{q^{a}}$ . Sparse recovery algorithms, on the other hand, amount to decoding $\boldsymbol{y}$ . To obtain “good” measurements (which best preserve the information contained in the channel matrix), the sensing matrix need to be stochastically optimized based on criteria like the $\operatorname{spark}{(\boldsymbol{B})}$ (i.e., minimum number of linearly dependent columns), the mutual coherence and the Restricted Isometry Property.

Since mmWave channel matrices are sparse, and since channel measurements are linear operations, CS became a dominant approach for tackling mmWave channel estimation problems. The main caveat here is that the standard CS problem is that of a sparse vector recovery, while mmWave channel estimation is a sparse matrix recovery. This distinction poses some challenges in tackling mmWave channel estimation under the umbrella of CS. To formulate MIMO channel estimation as a CS problem, a vectorization step is carried out (i.e., columns of matrices are stacked on top of each other to form one long vector). Nonetheless, unlike standard CS problems in which elements of the sensing matrices are directly chosen and optimized, the mmWave sensing matrix is a function of the transmit precoding and receive combining vectors. This adds an extra layer of complexity which is often ignored under the premise that since CS often requires random sensing matrices, then random beamforming is an obvious necessity. However, it is not immediately clear how a specific choice of precoders and combiners would affect the structure of $\boldsymbol{B}$ , and therefore, the performance of sparse recovery. Extending the design principles of sensing matrices from core CS theory to mmWave channel estimation is thus not straightforward and remains an open area of research.

Existing research on CS-based mmWave channel estimation relies on random arbitrary choices of precoding and combining vectors, e.g. uniformly distributed phase shifts [21, 22]. When this solution is incorporated in mmWave channel estimation, it translates into designing antenna beam patterns of highly irregular shapes (see Fig. 6). Such beam patterns are sensitive to variations of received signal power, thermal noise and resolution of ADCs and phase-shifters. Our proposed source-coding-based solution overcomes these limitations by imposing better, well-structured antenna patterns, where, in each measurement, a specific angular direction is either included (with constant beamforming gain) or is excluded. This provides better resilience to i) the presence of sidelobes, ii) variations in received signal power along any available path(s), iii) channel noise and iv) quantization error of ADCs and phase shifters. Furthermore, the deterministic nature of our source-coding-based measurements allows us to provide theoretical guarantees for channel recovery at a precise number of measurements. The source coding analogy also allows us to draw theoretical tight lower bounds on the number of measurements.

On the contrary, the number of required measurements in CS is commonly characterized as an order of magnitude. For instance, several state-of-the-art sparse recovery algorithms require $O(L\log(\frac{n}{L}))$ measurements, where $n$ is the number of dimensions of the sparse vector and $L{\ll}n$ is its sparsity level [9, 11]. This, however, is just a scaling law, which by definition, works in the asymptotic regime and is missing the constant scaling coefficient. Compare this to our solution, which accurately specifies the required number of measurements (based on $n$ and $L$ ).

Developing efficient sparse recovery algorithms for CS-based mmWave channel estimation is a rich area of research. Various algorithms have different computational complexities, recovery performance, favorable range of signal to noise ratio (SNR), etc. A comparison between several classes of sparse recovery algorithms is provided in [22]. These include convex relaxation (e.g. $\ell_{1}$ norm minimization), greedy iteration (e.g. Orthogonal Matching Pursuit (OMP)) and Bayesian Inference. Other algorithms also include Approximate Message Passing (AMP) [23] and its variants [24], as well as machine learning based sparse recovery [25].

Machine Learning: Deep learning is very powerful in extracting patterns from large amounts of data. It has been widely used in problems of computer vision, speech recognition and natural language processing. Recently, it has also been applied to problems in communications [26], including, but not limited to channel estimation [15, 16, 17]. For instance, in [17] the beamforming vectors at the TX and RX are “learned” based on uplink pilot signals simultaneously received at multiple base stations. The base stations share their received information on a cloud, on which data processing is performed. This idea, however, is critically dependent on a dense deployment of base stations. In [15, 16], deep learning is leveraged to ease the burden of heavy computations that would otherwise be required for measurement processing.

Coding: Exploiting source codes for mmWave channel estimation has not been studied before. Our earlier work in [18] drew an analogy between path discovery in mmWave channels and error discovery in Linear Block Channel Codes (LBC). There exists a duality between the error discovery problem of channel codes and linear source compression. That is, we can use LBCs as linear source compression codes, as well. Nonetheless, the channel coding analogy did not naturally lend itself to characterizing the lower bound on the required number of measurements. This paper differs from [18] in the following:

•

Channel measurements are envisaged as compressed versions of the channel, which are obtained based on lossless, fixed-rate, linear source codes.

•

The lower bound on the achievable number of measurements is accurately characterized using the entropy of the direction of the strong reflectors (in a stochastic spatial model). This not only provides a precise metric to quantify the efficiency of a used code, but it also provides a benchmark for evaluating other measurement schemes.

•

A DNN-based measurement decoding is proposed and evaluated against the more computationally complex “search” method of [18].

•

A comparison to compressed sensing based mmWave channel estimation is provided, demonstrating the superiority of our proposed approach.

Hashed Beams: An idea of direction inclusion/exclusion used to generate antenna patterns was adopted in [13]. Specifically, every measurement combines signals coming from a randomly chosen set of angular directions. This describes the encoding part and it resembles a random binary code. For decoding, a threshold-based decision determines whether a strong propagation path exists (if a path exists, it lies at one of the directions included in this measurement). The direction which was most frequently included in the measurements that revealed a strong path is declared as the angular direction of the strongest channel path. This method discovers one path, and requires $O(L\log(n))$ measurements. In our proposed approach, however, the angular directions whose respective beams are overlapped are precisely determined using a carefully chosen code. We also use an elaborate decoding method that is capable of discovering multiple paths. It also guarantees a lower number of measurements since a randomly chosen code is not expected to outperform a carefully designed one.

III Motivating Example

Consider an RX equipped with an antenna array which can form $8$ distinct beams. These 8 beams divide the angular space into resolvable directions, i.e., $d_{1},d_{2},\dots,d_{8}$ , as shown in Fig. 1. The RX needs to establish a Line of Sight (LoS) communication link with TX. This requires some sort of “searching” over the angular space at both TX and RX. For ease of illustration, let us reduce the link establishment problem to that of “Angle of Arrival (AoA) discovery” at RX by assuming that TX transmits its signals omnidirectionally. Let the path gain of LoS be denoted by $\alpha$ , which can take arbitrary values. For simplicity of notation, let us assume $\alpha=1$ .

Our objective is: Find the specific direction $d_{i^{*}}$ which contains the LoS path to TX using the least possible number of measurements. To do so, we envisage the measurement process as lossless, fixed-rate channel compression. This enables harnessing the power of source compression codes to minimize the number of measurements. It also enables deriving lower bounds on the number of measurements, using which we can accurately find the LoS (or conclude it is blocked). We propose a measurement approach which has a predetermined measurement sequence that (1) does not require feedback and (2) is capable of finding the LoS path, no matter in which $d_{i}$ it exists. Therefore, a constant number of measurements, $m$ , is needed for all $d_{i}$ .

The key idea of LoS discovery using non-feedback linear source coding is to:

Construct a binary codebook that represents the angular channel,
Find a proper fixed-rate linear source code that losslessly compresses all codewords in that codebook, and
Use this code to design the measurements. These steps can be elucidated as follows: i) Constructing the codebook is as easy as finding all possible binary vectors that represent the LoS position. Since $\alpha{=}1$ , this codebook is exactly the set of all possible channel vectors. Let the channel between TX and RX be denoted by $\boldsymbol{q^{a}}$ , and let $\mathcal{Q}^{a}$ be the set of all possible channels. The channel $\boldsymbol{q^{a}}{\in}\mathcal{Q}^{a}$ has $8$ components; each one represents the path gain corresponding to a unique angular sector as shown in Fig. 1. Table I shows all possible $\boldsymbol{q^{a}}$ in our setup (for arbitrary gain values, simply replace the ‘1’s in Table I with $\alpha$ ). ii) Choose the linear source code, denoted by its generator matrix $\boldsymbol{G}$ as:

[TABLE]

To compress $\boldsymbol{q^{a}}$ , we simply need to find the matrix multiplication $\boldsymbol{y^{s}}{=}\boldsymbol{G}\boldsymbol{q^{a}}$ (see Table I). iii) Design the measurements such that $\boldsymbol{y^{s}}$ is imitated by the measurement results. This is done by beamforming at RX. Notice that the $i^{\text{th}}$ measurement, i.e., $y^{s}_{i}$ (the $i^{\text{th}}$ component of $\boldsymbol{y^{s}}$ ) is the multiplication of the $i^{\text{th}}$ row of $\boldsymbol{G}$ by $\boldsymbol{q^{a}}$ . Mathematically, this is just adding all elements $q^{a}_{j}$ of $\boldsymbol{q^{a}}$ which corresponds to $g_{i,j}{=}1$ , ( $g_{i,j}$ is the element at row $i$ , and column $j$ in $\boldsymbol{G}$ ). That is

[TABLE]

Hence, measurement $i$ should only contain the directions $d_{j}$ whose corresponding $g_{i,j}$ equals $1$ , and exclude the rest (Notice that we can map the $i^{\text{th}}$ row of $\boldsymbol{G}$ to the $i^{\text{th}}$ measurement and the $j^{\text{th}}$ column to the $j^{\text{th}}$ sector (direction $d_{j}$ )). Essentially, this means that in each of the measurements $y^{s}_{i}$ , we combine the signals received at a specific set of AoA directions This can be realized by carefully shaping the antenna pattern using beamforming. Fig. 2 highlights this process for the $1^{\text{st}}$ measurement in which only the direction $d_{1},d_{5}$ and $d_{8}$ are included. The measurement results $\forall d_{i}$ are shown in Table I. Note that the number of required measurements is $4$ for all $d_{i}$ .

Lower Bound: A fundamental question that arises here is: Can we find a better fixed-rate, lossless source code (other than the one given in Eq. (1)) that would produce fewer measurements, and hence increase the efficiency of the measurement process? To answer this question, we need to find the minimum expected number of measurements required to discover the channel using our proposed source coding solution. This number is identical to the minimum average code length (over all fixed-rate, lossless codes). The minimum average code length is well-known to be lower bounded by the Shannon Entropy; denoted by $H_{2}$ and defined as

[TABLE]

Calculating $H_{2}\left(\boldsymbol{q^{a}}\right)$ requires knowledge of the probability distribution $\mathbb{P}\left(\boldsymbol{q^{a}}\right)$ . Fixed-rate codes, however, do not account for the frequency of $\boldsymbol{q^{a}}$ (hence, the mapping to equal-length codes). By limiting the space of codes to be over fixed-rate codes, we can improve the bound to be

[TABLE]

where $|\mathcal{Q}^{a}|=9$ (recall that there exists $9$ possible scenarios for $\boldsymbol{q^{a}}$ as shown in Table I). This tighter bound is obtained by assuming a Uniform distribution, which is the entropy maximizing distribution, over the channel space $\mathcal{Q}^{a}$ . Eq. (4) reveals that our chosen code achieves the lower bound of $4$ measurements. We provide a formal discussion on the lower bound in Section V-D.

*Remark**.*

This Motivating Example only dealt with a simplified channel model, with only one channel path and a fixed path gain of $\alpha=1$ . However, in the rest of the paper, we will consider generalized channel models with possibly several paths of arbitrary path gain values, i.e., $\alpha{\in}\mathbb{C}$ .

IV System Model

We consider point-to-point mmWave channels with $n_{t}$ and $n_{r}$ antennas at TX and RX, respectively. Antennas at TX and RX form Uniform Linear Arrays (ULA). Generalization to Uniform Planar Arrays is straightforward but not considered in this paper for simplicity. Every antenna element is connected to a phase-shifter and a low-power variable-gain amplifier (VGA)111The use of VGAs in analog transceivers is common in practice. For instance, in IEEE 802.11ad [27] both phase and amplitude components are used to specify antenna weights, and commercial devices like Wilocity Wil6200 offer this capability. VGAs are also used along with phase-shifters in practice to help compensate for their phase-dependent insertion loss [28].. On the TX side, a single RF chain feeds its ULA through an $n_{t}$ -way power splitter, while on the RX side, the outputs of the ULA, after being processed by amplifiers and phase-shifters, are then linearly combined using an adder and fed through to a single RF chain with in-phase (I) and quadrature (Q) channels. Two mid-tread ADCs with $2^{b}{+}1$ levels are used to quantize the I and Q components of the received signal. The term $b$ loosely denotes the number of bits that describe the ADC resolution. Fig. 3 depicts the transceiver architecture.

We assume single-tap channels where all channel paths have just one significant tap. We also adopt a channel clustering model where paths between TX and RX form clusters in the angular domain [6, 1]. Let $L$ denote the number of available channel clusters. Due to the sparse nature of mmWave channels, only a limited number of clusters exist222Prior knowledge about the number of clusters can be obtained from statistical channel information, which in turn are obtained from channel measurement campaigns. For instance, measurements carried out in New York City revealed that an average number of $2$ or $3$ clusters exists in mmWave channels at $28$ and $73$ GHz [3]., where $L\ll n_{r},n_{t}$ . Since distinct paths within each cluster cannot typically be resolved, we assume that each cluster contains only one path. Each channel path (e.g., $l^{th}$ path) is attributed with an AoD $\theta_{l}$ , an AoA $\phi_{l}$ and a path gain $\alpha_{l}$ . Let $\alpha_{l}^{b}\in\mathbb{C}$ denote the baseband path gain such that

[TABLE]

where $\rho_{l}$ is the path length and $\lambda_{c}$ is the carrier wavelength. We define the directional cosines of the AoD and AoA of the $l^{th}$ path as $\Omega_{tl}{\triangleq}\cos\left(\theta_{l}\right)$ and $\Omega_{rl}{\triangleq}\cos\left(\phi_{l}\right)$ , respectively. The transmit and receive spatial signatures at an arbitrary directional cosine $\Omega$ is denoted by $\boldsymbol{e_{t}}\left(\Omega\right)$ and $\boldsymbol{e_{r}}\left(\Omega\right)$ , receptively. We define $\boldsymbol{e_{t}}\left(\Omega\right)$ (and similarly $\boldsymbol{e_{r}}\left(\Omega\right)$ ) as:

[TABLE]

where $\Delta_{t}$ and $\Delta_{r}$ are the antenna separations at TX and RX ULAs, normalized by $\lambda_{c}$ .

Let $\boldsymbol{Q}\in\mathbb{C}^{n_{r}{\times}n_{t}}$ denote the channel matrix such that

[TABLE]

The corresponding angular channel of $\boldsymbol{Q}$ , whose rows and columns divide the channel into resolvable RX and TX angular bins, respectively, is denoted by $\boldsymbol{Q^{a}}$ and can be obtained as:

[TABLE]

If $n_{t}$ or $n_{r}$ equals $1$ , $\boldsymbol{Q}$ and $\boldsymbol{Q^{a}}$ are reduced to vectors which we denote by $\boldsymbol{q}$ and $\boldsymbol{q^{a}}$ , respectively. The matrices $\boldsymbol{U_{t}}$ and $\boldsymbol{U_{r}}$ are the transmit and receive unitary Discrete Fourier Transform (DFT) matrices whose columns form an orthonormal basis for the transmit and receive signal spaces $\mathbb{C}^{n_{t}}$ and $\mathbb{C}^{n_{r}}$ , respectively. The definition of $\boldsymbol{U_{t}}$ (likewise for $\boldsymbol{U_{r}}$ ) is given as [29, Chapter 7.3.4]

[TABLE]

where $L_{t}{=}n_{t}\Delta_{t}$ ( $L_{r}{=}n_{r}\Delta_{r}$ ) denote the length of the TX (RX) antenna array, normalized by $\lambda_{c}$ .

Similar to [11, 30, 31], we assume perfect sparsity where channel paths lie along AoD and AoA directions defined in $\boldsymbol{U_{t}}$ and $\boldsymbol{U_{r}}$ . Hence, each path only contributes to a single component of $\boldsymbol{Q^{a}}$ . Thus, only $L$ (possibly less) non-zero components in $\boldsymbol{Q^{a}}$ exists. The baseband model is

[TABLE]

where $\boldsymbol{u_{b}}$ is the received vector at RX front-end while $\boldsymbol{n}{\sim}\mathcal{CN}\left(\boldsymbol{0},N_{0}\boldsymbol{I}_{n_{r}}\right)$ is an i.i.d. complex Gaussian noise vector. TX sends pilot symbols $s$ , with power $P$ , which are processed using precoders $\boldsymbol{f}_{j}{\in}\mathbb{C}^{n_{t}}$ to obtain the transmit vectors $\boldsymbol{x}{=}\boldsymbol{f}_{j}s$ . Hence, the transmit SNR is

[TABLE]

where $\mu$ is the average path loss (which depends on the carrier frequency, atmospheric conditions, average distance between TX and RX). Note that SNR and $\mu$ are not path dependent. The rx-combining vectors $\boldsymbol{w}_{i}{\in}\mathbb{C}^{n_{r}}$ are used to obtain the received symbols $u_{i,j}$ such that

[TABLE]

where $i\in\{1,\dots,m_{r}\},j\in\{1,\dots,m_{t}\}$ . Finally, a quantized version $u^{s}_{i,j}$ of $u_{i,j}$ is obtained such that

[TABLE]

where $[\cdot]_{+}$ represents the quntization function. The noise component, normalized by $\left\lVert\boldsymbol{w}_{i}\right\rVert$ has a complex Gaussian distribution, i.e., $\frac{\boldsymbol{w}_{i}^{H}\boldsymbol{n}}{\left\lVert\boldsymbol{w}_{i}\right\rVert}\sim\mathcal{CN}\left(0,N_{0}\right)$ .

Let $y^{s}_{i,j}=\boldsymbol{w}_{i}^{H}\boldsymbol{Q}\boldsymbol{f}_{j}s$ denote the error-free measured symbols and let $z_{i,j}=u^{s}_{i,j}-y^{s}_{i,j}$ denote the measurement error which includes both channel noise and quantization error.

IV-A Problem Formulation

The problem we need to solve is to minimize the number of measurements $m=m_{t}\times m_{r}$ such that $\boldsymbol{Q}$ can be reliably reconstructed. This problem can be mathematically stated as: {mini!}—l—[2] w_i, f_j, D m_t ×m_r P1: \addConstrainty^s_i,j = w_i^H Q f_j \addConstraintD({y^s_i,j}) = Q^a

Note that $y^{s}_{i,j}$ exists $\forall i,j\in\{1,\dots,m_{r}\}\times\{1,\dots,m_{t}\}$ . That is, measurements are taken using all combinations of $\boldsymbol{f}_{j}$ and $\boldsymbol{w}_{i}$ . We also use $s=1$ . The design variables are the tx-precoders $\boldsymbol{f_{j}}$ , the rx-combiners $\boldsymbol{w_{i}}$ and the decoding function $\mathcal{D}$ . We do not explicitly consider the impact of errors in this formulation but its effect will be studied in Section V-E. Note also that due to the use of VGAs at each antenna element, the constant modulus constraint on $\boldsymbol{f_{j}}$ and $\boldsymbol{w_{i}}$ , that is often incorporated in analog beamforming designs, is not needed.

V Source-Coding-Based Measurements

In this section, we formally introduce mm-wave beam discovery as a source coding problem. We initially focus on channels with single-transmit, multiple-receive antennas. Specifically, we provide the conditions under which a chosen fixed-rate source code can be used to uniquely “encode” channel vectors in $\mathbb{C}^{n_{r}}$ into measurement vectors of fewer components. This setting is identical to that of multiple-transmit, single-receive antennas. In Section VI, we show how to use DNNs to “decode” the measurements and obtain an estimate for the observed channel. Then, in Section VII, we consider general channels with multiple TX and RX antennas. Now, let us start with the following discussion on source codes.

V-A Source Codes

Let $C$ be a binary linear source code with encoding and decoding functions denoted by $\mathcal{E}_{\mathbb{F}_{2}}$ and $\mathcal{D}_{\mathbb{F}_{2}}$ , respectively. We refer to $C$ as the encoding-decoding function pair $\left(\mathcal{E}_{\mathbb{F}_{2}},\mathcal{D}_{\mathbb{F}_{2}}\right)$ . The subscript $\mathbb{F}_{2}$ denotes the finite field of two elements $\mathsf{0}_{\mathbb{F}_{2}}$ and $\mathsf{1}_{\mathbb{F}_{2}}$ (also referred to as $GF(2)$ ) over which the code $C$ is defined. Later on, we will drop the subscripts to simplify notation as long as they can be inferred from the context.

Definition V.1 (Linear Source Code).

A source code $C$ whose encoding function $\mathcal{E}_{\mathbb{F}_{2}}$ is a linear function of the source sequences is called a linear source code.

Let $\boldsymbol{s}$ be a source sequence of length $n$ where $\boldsymbol{s}\in\mathcal{S}\subseteq\left\{\mathsf{0}_{\mathbb{F}_{2}},\mathsf{1}_{\mathbb{F}_{2}}\right\}^{n}$ , and let $\boldsymbol{c_{s}}\in\mathcal{I}_{\mathcal{S}}\subseteq\left\{\mathsf{0}_{\mathbb{F}_{2}},\mathsf{1}_{\mathbb{F}_{2}}\right\}^{m}$ be its associated binary representation under $C$ where $\mathcal{I}_{\mathcal{S}}$ is the image of $\mathcal{S}$ under $\mathcal{E}_{\mathbb{F}_{2}}$ . Thus, using a linear source code $C$ , we can find the representation of $\boldsymbol{s}$ under $C$ using

[TABLE]

where $\boldsymbol{G}\in\left\{\mathsf{0}_{\mathbb{F}_{2}},\mathsf{1}_{\mathbb{F}_{2}}\right\}^{m\times n}$ is called the generator matrix. Note that linearity guarantees fixed-rate since the code length is a constant value (equals the number of rows of $\boldsymbol{G}$ ).

The decoding function $\mathcal{D}_{\mathbb{F}_{2}}$ , maps sequences $\boldsymbol{c_{s}}$ to a corresponding source sequence $\hat{\boldsymbol{s}}\in\hat{\mathcal{S}}\subseteq\left\{\mathsf{0}_{\mathbb{F}_{2}},\mathsf{1}_{\mathbb{F}_{2}}\right\}^{n}$ . Suppose that $\mathcal{S}$ is the set of all sequences such that if $\boldsymbol{s}_{1},\boldsymbol{s}_{2}\in\mathcal{S}$ , we have that $\boldsymbol{s}_{1}\neq\boldsymbol{s}_{2}\xLongleftrightarrow{\text{iff}}\boldsymbol{c_{s}}_{1}\neq\boldsymbol{c_{s}}_{2}$ . In other words, $\mathcal{E}_{\mathbb{F}_{2}}:\mathcal{S}\rightarrow\mathcal{I}_{\mathcal{S}}$ is injective (one-to-one). Consequently, if we define the function $\mathcal{D}_{\mathbb{F}_{2}}$ over $\mathcal{I}_{\mathcal{S}}$ as the inverse function of $\mathcal{E}_{\mathbb{F}_{2}}$ , i.e., $\mathcal{E}_{\mathbb{F}_{2}}^{-1}\triangleq\mathcal{D}_{\mathbb{F}_{2}}:\mathcal{I}_{\mathcal{S}}{\rightarrow}\mathcal{S}$ , then we have that $\hat{\boldsymbol{s}}=\mathcal{D}_{\mathbb{F}_{2}}\left(\mathcal{E}_{\mathbb{F}_{2}}\left(\boldsymbol{s}\right)\right)=\boldsymbol{s},\;\;\forall\boldsymbol{s}\in\mathcal{S}$ .

V-B MmWave Beam Discovery

Let $\boldsymbol{q^{a}}\in\mathbb{C}^{n_{r}}$ denote the angular channel vector between TX and RX. Define $\boldsymbol{q}^{a}_{s}\in\left\{0,1\right\}^{n_{r}}$ to be the support vector associated with $\boldsymbol{q}^{a}$ such that $\boldsymbol{q}^{a}_{s}=\begin{pmatrix}q^{a}_{s_{1}}&q^{a}_{s_{2}}&\dots&q^{a}_{s_{n_{r}}}\end{pmatrix}^{T}$ where $q^{a}_{s_{i}}=1$ if $q^{a}_{i}\neq 0$ and $q^{a}_{s_{i}}=0$ otherwise. More generally, a support vector can be defined as:

Definition V.2 (Support vector).

The support vector $\boldsymbol{v_{s}}$ associated with an arbitrary $n-$ dimensional vector $\boldsymbol{v}\in\mathbb{C}^{n}$ is a binary vector of the same size that identifies the non-zero components of $\boldsymbol{v}$ and whose components, $v_{si}$ , are defined as $v_{si}=1$ if $v_{i}\neq 0$ and $v_{si}=0$ if $v_{i}=0$ .

We further define the set of non-zero indexes $\mathcal{X}_{\boldsymbol{v}}$ of an arbitrary vector $\boldsymbol{v}$ as follows:

Definition V.3 (Set of Non-Zero Indexes $\mathcal{X}_{\boldsymbol{v}}$ ).

For any arbitrary $n-$ dimensional vector $\boldsymbol{v}$ , we define $\mathcal{X}_{\boldsymbol{v}}$ as the set of indexes of its non-zero components, i.e., $\mathcal{X}_{\boldsymbol{v}}=\left\{i|v_{i}\neq 0\;,0{\leq}i{\leq}n{-}1\right\}$ .

Hence, if $\boldsymbol{v}_{s}$ is the support vector corresponding to $\boldsymbol{v}$ , then we have that $\mathcal{X}_{\boldsymbol{v}}=\mathcal{X}_{\boldsymbol{v}_{s}}$ , since $v_{i}=0{\iff}{v}_{s_{i}}=0$ . Now, let $\mathcal{Q}^{a}$ be the set containing all possible channel vectors $\boldsymbol{q}^{a}$ . Also let $\mathcal{Q}^{a}_{s}$ be the set of all support vectors $\boldsymbol{q}^{a}_{s}$ such that their corresponding channels $\boldsymbol{q}^{a}{\in}\mathcal{Q}^{a}$ . An interesting behavior we have for these sets is as follows: If we have a channel $\boldsymbol{q}^{\boldsymbol{a}}_{1}$ whose support vector $\boldsymbol{q}^{a}_{s_{1}}\in\mathcal{Q}^{a}_{s}$ , then removing any non-zero component(s) from $\boldsymbol{q}^{\boldsymbol{a}}_{1}$ (due to blockage for example) would still yield a valid channel $\boldsymbol{q}^{\boldsymbol{a}}_{2}\in\mathcal{Q}^{a}$ , whose support vectors $\boldsymbol{q}^{\boldsymbol{a}}_{s_{2}}$ also belongs to $\mathcal{Q}^{a}_{s}$ . We call this the inclusion property.

Definition V.4.

[Inclusion Properties of $\mathcal{Q}^{a}_{s}$ ]

(i)

Let $\boldsymbol{q}^{a}_{s_{1}},\boldsymbol{q}^{a}_{s_{2}}\in\left\{0,1\right\}^{n_{r}}$ such that $\mathcal{X}_{\boldsymbol{q}^{a}_{s_{2}}}\subseteq\mathcal{X}_{\boldsymbol{q}^{a}_{s_{1}}}$ . If $\boldsymbol{q}^{a}_{s_{1}}\in\mathcal{Q}^{a}_{s}$ , then $\boldsymbol{q}^{a}_{s_{2}}\in\mathcal{Q}^{a}_{s}$ . 2. (ii)

$\boldsymbol{0}\in\mathcal{Q}^{a}_{s}$ . In fact, this is a consequence of property i above since for any $\boldsymbol{q}^{a}_{s}\in\mathcal{Q}^{a}_{s}$ , we have that $\mathcal{X}_{\boldsymbol{0}}{=}\varnothing{\subseteq}\mathcal{X}_{\boldsymbol{q}^{a}_{s}}$ .

Now, we are ready to present the theorem that establishes the conditions that need to be satisfied by a linear source code so that each possible channel $\boldsymbol{q^{a}}$ would result in a unique measurement vector $\boldsymbol{y^{s}}$ . Impairments under noise are not addressed in this theorem.

Theorem 1 .

Consider a binary linear source code $C$ whose encoding function $\mathcal{E}$ (defined by the binary generator matrix $\boldsymbol{G}$ ) is an injective function defined over $\mathcal{Q}^{a}_{s}\in\left\{0,1\right\}^{n_{r}}$ . If we consider $\boldsymbol{G}$ to be defined over the complex field, then for all channel vectors $\boldsymbol{q}^{a}_{1},\boldsymbol{q}^{a}_{2}\in\mathcal{Q}^{a}\subseteq\mathbb{C}^{n_{r}}$ we have $\boldsymbol{q}^{a}_{1}\neq\boldsymbol{q}^{a}_{2}\textit{ if and only if }\boldsymbol{G}\boldsymbol{q}^{a}_{1}=\boldsymbol{y}_{1}^{\boldsymbol{s}}\neq\boldsymbol{y}_{2}^{\boldsymbol{s}}=\boldsymbol{G}\boldsymbol{q}^{a}_{2}$ .

Proof.

Let $\boldsymbol{q}^{a}_{1},\boldsymbol{q}^{a}_{2}\in\mathcal{Q}^{a}$ , and let $\boldsymbol{y}^{s}_{i}=\boldsymbol{G}\boldsymbol{q}^{a}_{i}$ . Now, assume that $\boldsymbol{q}^{a}_{1}\neq\boldsymbol{q}^{a}_{2}$ . Then, we have that

[TABLE]

where $\boldsymbol{g}_{i}$ is the $i^{\text{th}}$ column of $\boldsymbol{G}$ . To show that $\boldsymbol{y}^{s}_{1}-\boldsymbol{y}^{s}_{2}\neq\boldsymbol{0}$ , we need to show that all vectors $\boldsymbol{g}_{i}\;\forall i\in\mathcal{X}_{\boldsymbol{v}}$ , are linearly independent. Otherwise, if such vectors $\boldsymbol{g}_{i}$ are linearly dependent, then $\exists v_{i}\in\mathbb{R}$ for $i\in\mathcal{X}_{\boldsymbol{v}}$ such that $\boldsymbol{y}^{s}_{1}-\boldsymbol{y}^{s}_{2}=\boldsymbol{G}\boldsymbol{v}=\boldsymbol{0}$ .

In fact, we can show a stronger statement: “all vectors $\boldsymbol{g}_{i}\;\forall i\in\mathcal{X}_{\boldsymbol{q}^{a}_{1}}\cup\mathcal{X}_{\boldsymbol{q}^{a}_{2}}\supseteq\mathcal{X}_{\boldsymbol{v}}$ , are linearly independent”. Note that $\mathcal{X}_{\boldsymbol{q}^{a}_{1}}$ and $\mathcal{X}_{\boldsymbol{q}^{a}_{2}}$ are the sets of indexes of the non-zero components of $\boldsymbol{q^{a}_{1}}$ and $\boldsymbol{q^{a}_{2}}$ , respectively (recall Definition V.3) and that $\mathcal{X}_{\boldsymbol{q}^{a}_{1}}=\mathcal{X}_{{\boldsymbol{q}^{a}_{s1}}}$ and $\mathcal{X}_{\boldsymbol{q}^{a}_{2}}=\mathcal{X}_{{\boldsymbol{q}^{a}_{s2}}}$ .

•

First, let us show that $\mathcal{X}_{\boldsymbol{v}}$ is a subset of $\mathcal{X}_{\boldsymbol{q}^{a}_{1}}\cup\mathcal{X}_{\boldsymbol{q}^{a}_{2}}$ .

Since $v_{i}=q^{a}_{1,i}-q^{a}_{2,i}\;\;\forall\;1\leq i\leq n_{r}$ , then $q^{a}_{i,1}=q^{a}_{i,2}=0\Longrightarrow v_{i}=0$ . Therefore, we have

[TABLE]

Then, by taking the complements of both sides we obtain the required result (note that $\{\cdot\}^{c}$ denotes a set complement).

•

Second, we show that vectors in the set $\mathcal{G}\triangleq\left\{\boldsymbol{g}_{i}|i\in\mathcal{X}_{\boldsymbol{q}^{a}_{1}}\cup\mathcal{X}_{\boldsymbol{q}^{a}_{2}}\right\}$ are linearly independent over $\mathbb{F}_{2}$ : Assume towards contradiction that $\mathcal{G}$ is linearly dependent over $\mathbb{F}_{2}$ . Hence, there exists a set $\mathcal{G}_{D}\subseteq\mathcal{G}$ such that any $\boldsymbol{g}_{i_{0}}\in\mathcal{G}_{D}$ can be written as a linear combination of all other vectors in $\mathcal{G}_{D}$ , i.e.,

[TABLE]

Note that over $\mathbb{F}_{2}$ , we can assume, without loss of generality (W.L.O.G.), that the coefficients of the linear combination above are $1$ ’s. Hence, we have that

[TABLE]

Next, assume that $\exists\boldsymbol{q}^{a}_{s_{3}},\boldsymbol{q}^{a}_{s_{4}}{\in}\mathcal{Q}^{a}_{s}$ such that $\mathcal{X}_{\boldsymbol{v}_{s}}{=}\left\{j|\boldsymbol{g}_{j}\in\mathcal{G}_{D}\right\}$ where $\boldsymbol{v}_{s}{=}\boldsymbol{q}^{a}_{s_{3}}{-}\boldsymbol{q}^{a}_{s_{4}}\mod 2$ .

Then, since $\boldsymbol{G}$ is injective over $\mathcal{Q}^{a}_{s}$ , then we have that

[TABLE]

But, if $\mathcal{G}_{D}$ is non-empty, then $\boldsymbol{v}_{s}\neq 0$ . Hence, we arrive at a contradiction to Eq. (18). Therefore, the set $\mathcal{G}$ is linearly independent over $GF(2)$ . It remains to show that such $\boldsymbol{q}^{a}_{s_{3}}$ and $\boldsymbol{q}^{a}_{s_{4}}$ indeed exist. Let us construct $\boldsymbol{q}^{a}_{s_{3}}$ as follows:

First, let $\boldsymbol{q}^{a}_{s_{3}}=\boldsymbol{q}^{a}_{s_{1}}$ , then, reset its $i^{\text{th}}$ component to [math] ( $q^{a}_{s_{3},i}=0$ ) if $\boldsymbol{g}_{i}\not\in\mathcal{G}_{D}$ . Similarly, set $\boldsymbol{q}^{a}_{s_{4}}=\boldsymbol{q}^{a}_{s_{2}}$ , then, reset the $i^{\text{th}}$ component to [math] ( $q^{a}_{s_{4},i}=0$ ) if $\boldsymbol{g}_{i}\not\in\mathcal{G}_{D}$ OR if $q^{a}_{s_{1},i}=1$ . Then, by construction, we have $\mathcal{X}_{\boldsymbol{q}^{a}_{s_{3}}}\subseteq\mathcal{X}_{\boldsymbol{q}^{a}_{s_{1}}}$ and $\mathcal{X}_{\boldsymbol{q}^{a}_{s_{4}}}\subseteq\mathcal{X}_{\boldsymbol{q}^{a}_{s_{2}}}$ . Hence, by the inclusion property (recall Definition V.4) we have $\boldsymbol{q}^{a}_{s_{3}},\boldsymbol{q}^{a}_{s_{4}}\in\mathcal{Q}^{a}_{s}$ since both $\boldsymbol{q}^{a}_{s_{1}},\boldsymbol{q}^{a}_{s_{2}}\in\mathcal{Q}^{a}_{s}$ . Also, it is easy to see that $q^{a}_{s_{3},j}-q^{a}_{s_{4},j}\mod 2=1\;\forall j:\boldsymbol{g}_{j}\in\mathcal{G}_{D}$ .

•

Third, by Lemma 2 below, the set $\mathcal{G}$ , now taken over $\mathbb{R}$ , is linearly independent.

Therefore, in Eq. (16), it follows that $\boldsymbol{y}_{1}^{\boldsymbol{s}}-\boldsymbol{y}_{2}^{\boldsymbol{s}}\neq\boldsymbol{0}$ if and only if $\boldsymbol{q}^{a}_{1}-\boldsymbol{q}^{a}_{2}\neq 0$ , which concludes the proof. ∎

Lemma 2 .

Any set of $n-$ dimensional linearly independent vectors over $\mathbb{F}_{2}$ are also linearly independent over $\mathbb{C}$ if we interpret their $\mathsf{0}_{\mathbb{F}_{2}}$ and $1_{\mathbb{F}_{2}}$ components to be real scalars.

The proof is provided in Appendix A.

V-C Beamforming Design

Now, we focus our attention on the design of beamforming vectors $\boldsymbol{w_{i}}$ , such that the measurement vector $\boldsymbol{y^{s}}$ is such that $\boldsymbol{y^{s}}=\boldsymbol{G}\boldsymbol{q^{a}}$ . Obviously, $\boldsymbol{w_{i}}$ depends on the chosen source code. Specifically, we want $\boldsymbol{w_{i}}$ to satisfy

[TABLE]

where $\boldsymbol{g_{i}}$ is the $i^{\text{th}}$ row of $\boldsymbol{G}$ , and $g_{i,j}$ is the $j^{\text{th}}$ element of $\boldsymbol{g_{i}}$ . Recall that if $n_{r}$ antennas exist at RX, then there exists $n_{r}$ resolvable angular directions. Let us call these directions $d_{1},d_{2},\dots d_{n_{r}}$ . We want $\boldsymbol{w_{i}}$ to combine the received signal components at specific angular directions. Those angular directions are determined by $\boldsymbol{g_{i}}=\begin{pmatrix}g_{i,1},g_{i,2},\dots,g_{i,n_{r}}\end{pmatrix}$ . Specifically, we want $\boldsymbol{w_{i}}$ to include the signal at directions $d_{j}$ for all $j$ such that $g_{i,j}=1$ .

Recall that $\boldsymbol{q}=\boldsymbol{U_{r}}\boldsymbol{q^{a}}$ . Hence, we can rewrite Eq. (23) as $\boldsymbol{w_{i}}^{H}\boldsymbol{q}=\boldsymbol{w_{i}}^{H}\boldsymbol{U_{r}}\boldsymbol{q^{a}}=\boldsymbol{g_{i}}\boldsymbol{q^{a}}$ . Thus, we need to design $\boldsymbol{w_{i}}$ such that $\boldsymbol{w_{i}}^{H}\boldsymbol{U_{r}}=\boldsymbol{g_{i}}$ , which can be rewritten as:

[TABLE]

Since the columns of $\boldsymbol{U_{r}}$ constitute an orthonormal basis, a very simple design of $\boldsymbol{w_{i}}$ is as a summation of the columns $\boldsymbol{e_{r}}\left(\frac{j-1}{L_{r}}\right)$ such that $g_{i,j}=1$ . In other words, we can design $\boldsymbol{w_{i}}$ as:

[TABLE]

*Remark**.*

The adopted beamforming design is ideal under the perfect sparsity assumption (which we adopt). That is when channel paths lie at the angular directions defined in $\boldsymbol{U_{r}}$ . In practice, however, channel paths arrive at arbitrary angles in $[0,2\pi]$ . This makes each path contribute to multiple components in $\boldsymbol{q^{a}}$ , hence, $\boldsymbol{q^{a}}$ is not perfectly sparse. This happens due to (i) antenna side-lobes, and (ii) beam-overlap. To resolve this problem, we can use side-lobe suppression techniques, e.g. Taylor Window, as well as large antenna arrays, which can form pencil-beam antenna patterns that avoid the beam-overlap problem. These come at the expense of a slight reduction in beam resolution. We leave this investigation for a future study and only focus on the main idea of using source-coding-based measurements.

V-D On the lower bound on the number of measurements

In Theorem 1, we showed that a linear source code $C$ which can uniquely encode all $\boldsymbol{q}^{a}_{s}{\in}\mathcal{Q}^{a}_{s}$ can be used to design a framework that uniquely measures all $\boldsymbol{q}^{a}{\in}\mathcal{Q}^{a}$ . Let the compression ratio of the code $C$ be denoted by $r_{c}$ such that $r_{c}=\frac{m}{n_{r}}$ . where $m$ and $n_{r}$ are the number of rows and columns of $C$ ’s generator matrix $\boldsymbol{G}$ , respectively.

Reducing the number of measurements is a fundamental objective for the mm-wave beam discovery problem. In light of Theorem 1, we can see that finding a source code with a high compression rate (small $r_{c}$ ) is crucial for attaining such an objective. In the following discussion, we try to better understand the nature of this lower bound in the context of our proposed solution.

Corollary 2.1 .

Let $\underaccent{\bar}{m}$ denote the lowest possible number of measurements for mm-wave beam discovery using lossless, fixed-rate source coding. Then, we have

[TABLE]

where $H_{2}\left(\cdot\right)$ is the binary entropy function.

Proof.

Suppose that $C$ is a linear lossless fixed-rate source code which can uniquely compress all $\boldsymbol{q}^{a}_{s}\in\mathcal{Q}^{a}_{s}$ . By Theorem 1, we have that the number of measurements needed for estimating the mm-wave channel is equal to $m$ (the length of encoded channel support vectors). Since the length of compressed sequences for any such code is lower bounded by $H_{2}\left(\boldsymbol{q}^{a}_{s}\right)$ , then we have $\underaccent{\bar}{m}\geq H_{2}\left(\boldsymbol{q}^{a}_{s}\right)$ . Moreover, since fixed-rate source codes do not take the probability distribution (i.e., frequency) of $\boldsymbol{q^{a}_{s}}$ into account, then we have $\underaccent{\bar}{m}\geq\sup_{\mathbb{P}\left(\boldsymbol{q}^{a}_{s}\right)}H_{2}\left(\boldsymbol{q}^{a}_{s}\right)\geq H_{2}\left(\boldsymbol{q}^{a}_{s}\right),\text{ where}$

[TABLE]

The result of solving the $\sup$ problem in Eq. (27) is $\mathbb{P}\left(\boldsymbol{q}^{a}_{s}\right)=\frac{1}{|\mathcal{Q}^{a}_{s}|}\;\forall\boldsymbol{q}^{a}_{s}$ since the uniform distribution maximizes the entropy. Since the number of measurement has to be an integer, we take the ceil of right hand side of Eq. (27). Finally, by the inclusion property in Definition V.4, we have $|\mathcal{Q}^{a}_{s}|=\sum_{i=0}^{L}{n_{r}\choose i}$ , which concludes the proof. ∎

V-E Channel Estimation Error

In Theorem 1, we have shown how to obtain unique measurements $\boldsymbol{y^{s}}\;\forall\;\boldsymbol{q^{a}}{\in}\mathcal{Q}^{a}$ . Recall that $\boldsymbol{y^{s}}{=}\boldsymbol{G}\boldsymbol{q^{a}}$ is an error-free measurement vector. In practice, however, measurements are never error-free. Measurements errors are bound to happen due to the effects of thermal and quantization noise, among others factors. The quality of channel estimates obtained using error-corrupted measurements is essentially degraded, which calls for a deeper understanding of the effects of such errors. A crucial question we try to answer here is: Do small perturbations/imperfections in channel measurements make channel estimates considerably deviate from their true values? In this section, we shed some light on this problem by deriving an upper bound on channel estimation error as a function of measurement error. We also show that for a special class of generator matrices, the channel estimation error, measured using the $\ell_{2}-$ norm, is smaller than or equal to the $\ell_{2}$ norm of the measurement error.

We denote error-corrupted measurements using $\boldsymbol{u^{s}}$ such that

[TABLE]

where $\boldsymbol{z}$ is the measurement error (recall Eq. (13) and the discussion that follows). Assume that we can perfectly decode any measurement vector into its corresponding channel. That is, for any measurement vector $\boldsymbol{y^{s}}$ , we can find a corresponding $\boldsymbol{q^{a}}$ such that $\boldsymbol{y^{s}}{=}\boldsymbol{G}\boldsymbol{q^{a}}$ (measurement decoding will be further discussed in Section VI). Let us also denote the channel estimate obtained using error-corrupted measurements $\boldsymbol{u^{s}}$ by $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ , i.e., $\boldsymbol{u^{s}}{=}\boldsymbol{G}\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ . The following proposition provides an upper bound on the channel estimation error in terms of measurements errors.

Proposition 3 .

Assume perfect measurement decoding, and let $\sigma_{\text{min}}\left(\cdot\right)$ denote the minimum singular value of a given matrix. Then, the channel estimation error is upper bounded as:

[TABLE]

Proof.

Let us start by writing $\boldsymbol{z}$ as: $\boldsymbol{z}=\boldsymbol{u^{s}}-\boldsymbol{y^{s}}=\boldsymbol{G}\left(\hat{\boldsymbol{q}}^{\boldsymbol{a}}-\boldsymbol{q^{a}}\right)$ . Therefore, we have

[TABLE]

Finally, by rearranging (32), we obtain the required statement

[TABLE]

Now, we see that if $\sigma_{\text{min}}\left(\boldsymbol{G}\right){\geq}1$ , then the channel estimation error (measured using the $\ell_{2}-$ norm) is smaller than or equal to the $\ell_{2}-$ norm of the measurement error, i.e., $\left\lVert\hat{\boldsymbol{q}}^{\boldsymbol{a}}-\boldsymbol{q^{a}}\right\rVert_{2}\leq\left\lVert\boldsymbol{z}\right\rVert_{2}$ . This, in fact, is the case for the class of generator matrices introduced in the following proposition

Proposition 4 .

Let $\boldsymbol{I}_{m}$ be the $m{\times}m$ identity matrix. Then, $\sigma_{\text{min}}\left(\boldsymbol{G}\right){\geq}1$ for $\boldsymbol{G}$ of the form:

[TABLE]

See Appendix C for proof.

*Remark**.*

It is not difficult to obtain generator matrices of the form in Eq. (33). For instance, syndrome source codes can be manipulated using row and column operations over the binary field to produce equivalent codes with $\boldsymbol{G}$ as in Eq. (33).

VI Measurement Decoding

Designing channel measurements that have one-to-one correspondence with $\boldsymbol{q}^{a}$ is only part of the solution. Equally important, however, is the ability to “decode” $\boldsymbol{y^{s}}$ back to $\boldsymbol{q}^{a}$ , i.e., figuring out what the function $\mathcal{D}(\cdot)$ , in Eq. (IV-A), is. The one-to-one correspondence between $\boldsymbol{q}^{a}$ and $\boldsymbol{y^{s}}$ guarantees that there exists an inverse function that maps $\boldsymbol{y^{s}}$ back to $\boldsymbol{q^{a}}$ . Nevertheless, since we can only obtain $\boldsymbol{u^{s}}$ ; an error-corrupted version of $\boldsymbol{y^{s}}$ , we cannot exactly regenerate $\boldsymbol{q^{a}}$ , but rather, an estimate $\hat{\boldsymbol{q}}^{a}$ . Given that measurement errors occur, our objective is to obtain $\hat{\boldsymbol{q}}^{a}$ such that its distance to $\boldsymbol{q^{a}}$ is as small as possible (i.e., minimize the estimation error). We use the $l_{2}$ norm as a distance measure between $\boldsymbol{q}^{a}$ and $\hat{\boldsymbol{q}}^{a}$ , defined as $\delta\left(\boldsymbol{q^{a}},\hat{\boldsymbol{q}}^{\boldsymbol{a}}\right)\triangleq\left\lVert\boldsymbol{q^{a}}-\hat{\boldsymbol{q}}^{\boldsymbol{a}}\right\rVert_{2}$

Optimal measurement decoding requires solving an $\ell_{0}$ -norm minimization problem [9]. This problem is not convex and its solution requires heavy computations, which is intractable for channels with large dimensions and/or relatively high sparsity level. An example for optimal measurement decoding is the “search” decoding, proposed in [18], which requires a combinatorial search over the column subspaces of $\boldsymbol{G}$ , and whose complexity is of order $O\left(n_{r}^{L}\right)$ . Again, this is prohibitive for large antenna arrays ( $n_{r}$ ) and large $L$ . Another solution is the “look-up” table method in [18], where quantized measurement-channel pairs are stored in memory. Here, the channel vector whose corresponding stored measurement is closest to the collected measurement is selected. The main disadvantage with this method is that the table size increases dramatically with the number of measuremtns and ADC quantization resolution. Motivated by the drawbacks of the look-up table and search methods, we propose an alternative Machine Learning (ML) based approach that uses Deep Neural Networks (DNN). DNNs in particular are commonly used as function approximation algorithms, hence they provide an appealing light-weight, data-driven, alternative solution for the measurement decoding problem. Our main goal here is to reduce the computational complexity while still maintaining reliable measurement decoding.

VI-A DNN-based mapping

ML is widely used to solve very complex problems through learning. We focus on supervised learning to solve the decoding problem, which is a multi-dimensional non-linear regression problem for which neural networks is a powerful tool. Specifically, we use a fully connected classical DNN with an input layer of $m$ nodes (equal to the measurement dimensions) and an output layer of $n_{r}$ nodes (equal to the channel dimensions). The DNN model is designed to handle real-valued input-output data. But, on the contrary, both the channel $\boldsymbol{q^{a}}$ and measurements $\boldsymbol{y^{s}}$ are complex-valued. To overcome this problem, observe that $\boldsymbol{y^{s}}$ can be written as $\boldsymbol{y}_{R}^{\boldsymbol{s}}{+}j\boldsymbol{y}_{I}^{\boldsymbol{s}}$ and $\boldsymbol{q^{a}}$ as $\boldsymbol{q}_{R}^{\boldsymbol{a}}{+}j\boldsymbol{q}_{I}^{\boldsymbol{a}}$ (i.e., in terms of their real and imaginary components). And notice that $\boldsymbol{y}_{R}^{\boldsymbol{s}}{=}\boldsymbol{G}\boldsymbol{q}_{R}^{\boldsymbol{a}}$ and $\boldsymbol{y}_{I}^{\boldsymbol{s}}{=}\boldsymbol{G}\boldsymbol{q}_{I}^{\boldsymbol{a}}$ . Therefore, we can construct an estimate $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ using its real and imaginary components, i.e., $\hat{\boldsymbol{q}}^{\boldsymbol{a}}_{R}{+}j\hat{\boldsymbol{q}}^{\boldsymbol{a}}_{I}$ , where $\hat{\boldsymbol{q}}^{\boldsymbol{a}}_{R}$ and $\hat{\boldsymbol{q}}^{\boldsymbol{a}}_{I}$ are estimated using $\boldsymbol{y}_{R}^{\boldsymbol{s}}$ and $\boldsymbol{y}_{I}^{\boldsymbol{s}}$ as inputs to the DNN model, respectively. Therefore, our DNN takes the measurement vectors $\boldsymbol{y^{s}}$ as inputs and produces the corresponding channel estimates $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ at its output, but it does so in two different steps, handling the real and imaginary parts separately. For ease of notation, we will not use real and imaginary components to refer to inputs and outputs of the DNN models but it should be understood that this is how we handle it. The number of hidden layers and their corresponding number of nodes are design parameters that depend on the sizes of the input and output, and the relationship governing them. For all hidden layers, we use the rectified linear (ReLU) activation function while for the output layer we use the linear activation function. We also use the ADAM optimizer [32] for training and the Mean Squared Error (MSE) loss function to quantify the model error333We use Keras API [33] to build, train, test and use the DNN model we propose. Our pre-trained models and DNN-related codes are available in [34]..

Model training: Although we do not have a closed form expression for mapping $\boldsymbol{y^{s}}$ to $\boldsymbol{q^{a}}$ (hence the need for an algorithmic solution), generating training data is actually straightforward. This is because the reverse direction (i.e., mapping $\boldsymbol{q^{a}}$ to $\boldsymbol{y^{s}}$ ) is just a simple linear transformation. Training data is generated as follows: For every $\boldsymbol{q}^{a}_{s}\in\mathcal{Q}^{a}_{s}$ (recall that $\mathcal{Q}^{a}_{s}$ is the set of all possible channel support vectors defined in Section V-B), we generate $n_{s}=300$ random channels, $\boldsymbol{q}^{a}$ , by choosing the non-zero components of $\boldsymbol{q^{a}}$ to be uniformly distributed in $[-\alpha^{b}_{\text{max}},\alpha^{b}_{\text{max}}]$ where $\alpha^{b}_{\text{max}}$ (recall Eq. (5)) is the maximum magnitude of baseband path gains, which can be obtained using channel statistics. Note that we can set $\alpha^{b}_{\text{max}}$ to be the maximum ADC quantized value of $|\alpha_{l}^{b}|$ . Thus, the total number of input-output samples we have is $n_{s}\times\left\lvert\mathcal{Q}^{a}_{s}\right\rvert$ . We use $70\%$ of these samples for training and the remaining $30\%$ for validation. Training is done using $200$ epochs with batches of size $32$ . We monitor the validation error to make sure that the model does not over-fit the training data. If over-fitting is observed (which is indicated by a persistent increase in validation error at the end of every epoch), we stop the training process and only keep the model which produced the least validation error. DNN training is done offline, and a trained DNN model is stored in memory to be used when needed.

VI-B DNN Model Assessment

To argue the reliability of DNN-based mapping, we test it using a channel with $n_{t}{=}1$ , $n_{r}{=}23$ and a maximum of $3$ paths i.e., $L{\leq}3$ . We compare its performance against the “search” method of [18]. Based on the described channel parameters, only $m{=}11$ measurements are needed to discover its paths (more details about this particular example are discussed in Section VIII). We design a DNN model with an input layer of $m{=}11$ nodes and an output layer of $n_{r}{=}23$ nodes. The model also has $5$ hidden layers with $1024,512,512,128$ and $128$ nodes, respectively. We train the DNN model using data generated as described in Section VI-A. Fig. 4(a) shows the average MSE loss of both training and validation data sets for $100$ epochs. Training achieves validation error of ${\approx}0.0143$ (averaged across all samples of validation data). The figure also shows close MSE values for training and validation. This indicates that the model generalizes well to measurements it had not seen before, which guarantees reliability for arbitrary measurements.

These initial results are promising. However, they are obtained using error-free measurements. This prompts us to test the resilience of DNN-based mapping against noisy measurements. We also compare its performance against the search method proposed in [18]. To do so, we generate a testing data set in the same way we created the training data. We also generate sets of uniformly distributed noise vectors where each noise set is drawn at a different value of transmit SNR from $-20$ to $20$ dB. The noise vectors are then added to the inputs (channel measurements) of the testing data set then passed through the trained model. The decoded $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ is recorded at the output. Similarly, we use the “search” method to decode the same noise-corrupted measurements.

For evaluation, we use i) average MSE, as well as ii) the probability of path misdetection (i.e., no path discovery). We say a path is correctly discovered if the path gain of its corresponding component in $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ is among the $L{=}3$ strongest components in $\hat{\boldsymbol{q}}^{\boldsymbol{a}}$ . Fig. 4(b) shows the average MSE obtained using the search and DNN-based mapping methods on a log scale. We see that at low SNR, DNN-based decoding outperforms the search method. This indicates that the DNN model is more resilient against measurement errors. At high SNR, however, the DNN’s MSE saturates at ${\approx}0.014$ which is the same value we obtain for validation during model training using noise-free inputs (not that the MSE value at which DNN-based mapping saturates can be made lower by further improvement of the DNN model). The search method’s MSE, on the other hand, keeps improving as SNR increases, nevertheless, for values below $10^{-2}$ the improvement is marginal. The probability of path misdetection, shown in Fig. 4(c), confirms the performance trend of the MSE. Specifically, we see that at low SNR, the DNN-based model outperforms the search method (i.e., has lower probability of misdetection) while at high SNR we see that the search method is better.

Computational complexity: As we have previously discussed, the search method requires high computational power. Precisely, ${n_{r}\choose L}$ iterations with one matrix inversion and two matrix multiplication operations are performed per iteration, which then produces a vector of length $m$ . Finally, an additional step of finding the minimum $l_{2}-$ norm of all $n_{r}\choose L$ vectors is performed. On the other hand, the DNN-based mapping just requires $N_{k}$ linear computations for the hidden and output layers, where $N_{k}$ is the number of nodes at the $k^{\text{th}}$ layer. These computations are of the form $\sum_{i=1}^{N_{k-1}}w_{i}a_{i}$ where $a_{i}$ is the value passed from the $i^{th}$ node of the previous layer and $w_{i}$ is the weight on its link. For this particular example, the search method and the DNN-based method were implemented on the same machine and on average the search method’s execution time was $11.2$ ms compared to $47\;\mu$ s for the DNN model.

VII Multiple Transmit and Receive Antennas

So far, we only dealt with channels of single-transmit, multiple-receive antennas. Recall that this setting is almost identical to multiple-transmit, single-receive antenna channels, except that in the former setting we seek to design $\boldsymbol{w_{i}}$ ’s to estimate the angular channel at RX, while in the latter, we design $\boldsymbol{f_{j}}$ ’s to estimate the angular channel at TX. In this section, we extend the channel setting to be of multiple-transmit, multiple-receive antennas. We build on the design principles and decoding methods of single transmit antenna channels and show how measurements are obtained and decoded to estimate the entire $n_{r}{\times}n_{t}$ channel.

VII-A Measurements

Unlike the single transmit antenna scenario where TX sends signals omnidirectionally, it can now focus its transmission on narrow angular directions. However, from RX’s point of view, no matter which set of directions the TX is transmitting into, it can only see a number of $n_{r}$ resolvable bins; only $L$ of which may have paths to TX. The same is true from TX’s perspective, where the TX can only see $n_{t}$ resolvable bins, only $L$ of which may have paths to the receiver444Recall that the directions at which the TX is transmitting and the RX is receiving are determined by their antenna beam patterns which are in turn determined by $\boldsymbol{f_{j}}$ and $\boldsymbol{w_{i}}$ , respectively (see Fig. 2).. Thus, for an arbitrary tx-precoder, the receiver would need to measure the channel using the same set of $\boldsymbol{w_{i}}$ ’s it needs for the $n_{t}=1$ setting. Upon decoding, the result would be $n_{r}$ angular rx bins (corresponding to the particular $\boldsymbol{f_{j}}$ used at TX). Similarly, for an arbitrary rx-combiner, the transmitter would need the same set of $\boldsymbol{f_{j}}$ ’s it needs for the $n_{r}=1$ setting to find its respective tx bins. To find such $\boldsymbol{f_{j}}$ ’s and $\boldsymbol{w_{i}}$ ’s, we invoke Theorem 1.

Let $\boldsymbol{f}_{j}\;\forall j{\in}\{1,\dots,m_{t}\}$ be the tx-precoding vectors and $\boldsymbol{w}_{i}\;\forall i{\in}\{1,\dots,m_{r}\}$ be the rx-combining vectors. Then, the channel measurements are obtained as follows: The transmitter sends a number of $m_{r}$ pilot symbols using each of its $m_{t}$ precoders. On the receiver side, for every tx-precoder, $m_{r}$ channel measurements are obtained using the distinct $m_{r}$ rx-combiners. Recall that $u_{i,j}=y^{s}_{i,j}+\boldsymbol{w}_{i}^{H}\boldsymbol{n}$ where $y^{s}_{i,j}=\boldsymbol{w}_{i}^{H}\boldsymbol{Q}\boldsymbol{f}_{j}$ (see Eq. (12)). Let us arrange the $m_{r}$ measurements corresponding to the $j^{th}$ tx-precoder in $\boldsymbol{y}^{s}_{j}$ and define $\boldsymbol{Y}^{s}$ as

[TABLE]

Thus, $\boldsymbol{Y^{s}}$ contains all $m_{t}{\times}m_{r}$ channel measurements necessary to discover all available paths.

VII-B Decoding $\boldsymbol{Y}^{s}$

To obtain $\hat{\boldsymbol{Q}}^{a}$ from $\boldsymbol{Y}^{s}$ , we perform multiple SIMO decoding operations555Alternatively, we could have trained a large DNN model which accepts all measurements $\boldsymbol{Y^{s}}$ and outputs an estimate $\hat{\boldsymbol{Q}}^{\boldsymbol{a}}$ . Adopting this strategy, however, results in overwhelming training complexity since this model would need to be trained with a massive training data set of size $n_{s}\times|\mathcal{Q}^{a}_{s}|=n_{s}\sum_{i=0}^{L}{n_{r}\times n_{t}\choose i}$ , where $\mathcal{Q}^{a}_{s}$ now is the set of all support vectors of size $n_{r}n_{t}{\times}1$ that represent the $n_{r}{\times}n_{t}$ vectorized channel matrices. Compare this to our solution of using 2 DNN models trained with data sets of sizes $n_{s}\sum_{i=0}^{L}{n_{r}\choose i}$ and $n_{s}\sum_{i=0}^{L}{n_{t}\choose i}$ , respectively., as described in Section VI. This procedure is highlighted in the diagram in Fig. 5 and is detailed as follows:

(i)

Decode every $\boldsymbol{y}^{s}_{j}\;\forall j\{1,2,\dots,m_{t}\}$ to obtain $\boldsymbol{q}^{a}_{rx,j}$ . Recall that $\boldsymbol{y}^{s}_{j}$ is the measurement vector corresponding to the $j^{th}$ tx-precoder. Thus, $\boldsymbol{q}^{a}_{rx,j}$ , is the $n_{r}{\times}1$ mm-wave channel observed at RX due to the TX signal transmission through the angular directions featured by $\boldsymbol{f}_{j}$ . 2. (ii)

After Step (i), we obtain a sequence of $m_{t}$ “measurement” components corresponding to each rx-bin. Each of these components is produced using a distinct tx-precoder. Let us denote these sequences by $\boldsymbol{y}^{s}_{tx,k}$ ( $1{\times}m_{t}$ row vectors), where $k\in\{1,2,\dots,n_{r}\}$ . 3. (iii)

Decode each $\boldsymbol{y}^{s}_{tx,k}$ to obtain $\boldsymbol{q}^{a}_{tx,k}$ ( $1{\times}n_{t}$ row vectors) whose components constitute all the tx-bins corresponding to the $k^{th}$ rx-bin. 4. (iv)

Stack all $\boldsymbol{q}^{a}_{tx,k}$ to obtain $\hat{\boldsymbol{Q}}^{a}$ (each representing the $k^{\text{th}}$ row of $\hat{\boldsymbol{Q}}^{a}$ ).

VIII Performance Evaluation

We evaluate the performance of our proposed coding-based solution under various simulation scenarios. Specifically, we consider $23{\times}23$ and $15{\times}32$ multi-path channels. For both channel settings, we assume the existence of a maximum of $3$ paths666Note that information on $L$ can be obtained from statistical channel models [3, 35]).. We also consider $15{\times}31$ single-path channels. The single-path assumption is appropriate for LoS scenarios where the path gain of the LoS is significantly higher than the gains of Non-LoS (NLoS) paths ( $\approx 20$ dB higher [3]).

To understand how our solution compares to the state-of-the-art, we implement a compressed-sensing-based channel estimation solution, as well as the IEEE 802.11ad’s (WiGig) beam discovery method. Note that, while compressed sensing is a generic solution that can be applied to multi-path channels (similar to our solution), the WiGig method is designed to discover one channel path, hence, we only use it for the $15\times 31$ single-path channel. Our results demonstrate that our proposed solution is more resilient to errors compared to both CS and 802.11ad, and produces higher quality estimates. Furthermore, we study the effect of ADC resolution on channel estimation performance. This is important because ADC’s power consumption is directly proportional to their resolution. Hence, it is necessary to understand the resolution limit beyond which only minimal gains, in terms of channel estimation performance, can be achieved.

VIII-A Performance Metrics

We adopt performance metrics that highlight: (i) the measurement overhead, (ii) accuracy of path discovery (iii) quality of the estimated path gains, and (iv) the impact of the channel estimates on achievable data rate. These metrics are evaluated numerically, using Monte Carlo simulations (averaged over $10^{5}$ simulation runs) and evaluated against different values of SNR. We define the performance metrics as follows: i) Number of measurements: Given by $m_{t}{\times}m_{r}$ . ii) Probability of path discovery: Paths are said to exist at the strongest $L$ components in the estimated channel $\hat{\boldsymbol{Q}^{a}}$ . iii) MSE (Normalized): Defined as the squared value of the Frobenius-norm of the estimation error, $\boldsymbol{Q^{a}}-\hat{\boldsymbol{Q}}^{\boldsymbol{a}}$ , normalized by the Frobenius-norm of $\boldsymbol{Q^{a}}$ , i.e., $\small\frac{\left\lVert\boldsymbol{Q^{a}}-\hat{\boldsymbol{Q}}^{\boldsymbol{a}}\right\rVert_{F}^{2}}{\left\lVert\boldsymbol{Q^{a}}\right\rVert_{F}^{2}}$ . iv) Outage Rate: Denoted by $R_{\text{out}}$ and defined as $R_{\text{out}}\triangleq\mathbb{E}\left[\left(1-\mathbbm{1}_{\{\text{out}\}}\right)\times C_{\boldsymbol{Q}}\right]$ where $C_{\boldsymbol{Q}}$ is the MIMO channel capacity of the channel $\boldsymbol{Q}$ [29], and $\mathbbm{1}_{\{\text{out}\}}$ is the indicator function that takes a value of $1$ in case of outage and [math] otherwise. We assume that an outage occurs if any of the strong channel paths were misdetected.

VIII-B Implemented Solutions

1- Source Coding: We test three different measurement decoding methods, which we integrate with our coding-based solution. All three methods are used to solve each of the sub-problems depicted in Fig. (5). The first is the “search” method of [18]. The second and third decoding methods are based on DNNs, but they differ in the way they are trained, i.e., with or without measurement errors. We explain them as follows:

•

DNN: Here, DNN models are trained using pure measurements, with no added noise components. Since models are not trained with errors, only one model can be used at all SNR and ADC resolution levels.

•

DNN-sd: Since measurement errors tend to degrade the performance of path discovery, we try to overcome this impediment by training DNN models with error-corrupted measurements. Since errors are dependent on ADC resolution and SNR (see Eq. (11)), we train multiple DNNs for different values of each. We call such model “DNN with selective defense” or “DNN-sd”. Note that the DNN-sd model is not dependent on specific path gain values since our SNR definition does not include the effect of individual path gains $\alpha_{l}$ and because training is done using a wide range of uniformly distributed gain values.

The DNN model parameters, including the number of layers, the number of nodes (neurons) per layer and the activation function, are selected using cross-validation. We also select the DNN model’s size such that we have a reasonably good input-output mapping performance while keeping the processing speed fairly fast. We used tensorflow [36] for creating and using DNN models. Both types of DNN models are trained offline and stored in memory.

2- Compressed Sensing: We use a similar formulation for the mmWave channel estimation problem as in [21, 22]. The tx-precoders and rx-combiners are obtained using random, uniformly distributed phase shift values. That is, the components of all $\boldsymbol{f_{j}}$ ’s and $\boldsymbol{w_{i}}$ ’s are of the form $\exp(j\vartheta)$ where $\vartheta\sim[0,2\pi)$ . Fig. 6 shows antenna pattern examples for random beamforming with $15$ and $31$ antennas. For measurement decoding, we use the “search” method, which is the optimal $\ell_{0}$ -norm minimization solution [9] for solving each of the sub-problems of channel decoding. While this may still be too computationally complex to be of practical use, it provides an upper bound on the performance of other sparse recovery algorithms like OMP, $\ell_{1}$ and $\ell_{2}$ -norm minimization, etc.

3- 802.11ad: We only consider the Sector Level Sweep stage of the channel estimation scheme of 802.11ad. At this stage, the TX starts by sequentially transmitting packets in all possible transmit AoDs (sectors) while the receiver performs quasi omni-directional reception. Then, the TX and RX switch modes where TX forms a quasi omni-directional pattern while RX sweeps through all possible receive AoAs (sectors). The directions that reveal the highest received signal strength is denoted as the AoA and AoD of the strongest channel path.

VIII-C Equating Energy Consumption

Various channel estimation solutions may require different number of measurements and may have different beamforming gains. Thus, it would not be fair to compare them at fixed transmission power. Instead, it is more fair to fix the total energy consumption for the whole channel estimation process of each solution. Thus, for comparison purposes, we opt to adjust the transmit power of each scheme such that the total amount of energy consumption for the entire measurement process remains the same.

Energy Calculation: The energy consumption, denoted by $E_{T}$ , is given by $E_{T}=m\times P_{T}\times\tau$ , where $m$ is the number of measurements, $P_{T}$ is the total transmit power per measurement and $\tau$ is the time duration of one measurement. Since the antenna patterns at TX/RX of our proposed scheme consist of multiple overlapped beams (recall Fig. 2), the total power $P_{T}$ is an integer multiple of the transmit power per direction/beam $P$ , which depends on the number of overlapped beams at TX and RX. Let $o_{t}$ and $o_{r}$ denote the number of overlapped beams at the TX and RX, respectively. Hence, we have that $P_{T}=o_{t}{\times}o_{r}{\times}P$ . We can further write the transmit power per beam as $P=\text{SNR}\frac{N_{0}}{\mu}$ (recall Eq. (11)). This gives us a total energy consumption (in millijoules (mJ)) for the entire measurement process as: $E_{T}=m{\times}o_{t}{\times}o_{r}{\times}\text{SNR}\frac{N_{0}}{\mu}{\times}\tau$ . Let $\mu{=}-88$ dB and $N_{0}{=}88$ dBm777 To find $\mu$ and $N_{0}$ , we assume a channel operating at a carrier frequency $f_{c}{=}60$ GHz with bandwidth $B{=}100$ MHz and distance between TX/RX of $d{=}10$ m. Further, we assume a receiver system with noise figure NF ${=}6$ dB and temperature $T_{0}{=}293^{\circ}$ kelvin. The path loss constitutes both the free space path loss (FSPL) and atmospheric absorption. FSPL is given as: $\text{FSPL}{=}-10\log_{10}\left(\frac{4\pi}{c}df_{c}\right)^{n_{p}}$ , where $n_{p}{=}2$ is the path loss exponent. Atmospheric absorption, however, can be ignored for small distances ( $\leq 50$ m) [37]. Hence, $\mu{=}\text{FSPL}{=}-88$ dB. The noise power (in dBm) can be given as $N_{0}{=}10\log_{10}\left(k_{B}T_{0}B{\times}1000\right){+}\text{NF}$ , where $k_{B}$ is the Boltzmann constant.. Finally, let $\tau\approx 23\mu$ s (from IEEE 802.11ad).

VIII-D Results

$15{\times}31$ ** single-path channels:** For this scenario, we choose ADCs of resolution $b{=}6$ bits. We provide results for our coding-based solution with both search and DNN-sd decoding. We also provide results for compressed sensing with search-based decoding, and IEEE 802.11ad beam alignment. We plot the results against the consumed energy $E_{T}$ . Source code selection: We choose codes which satisfy the requirements in Theorem 1 as follows: At the TX side, we choose the $(31,26)$ Hamming code to design tx-precoders, while at the RX side we choose the $(15,11)$ Hamming code to design rx-combiners. Both of which operate as syndrome source codes with generator matrices $\boldsymbol{G_{t}}$ and $\boldsymbol{G_{r}}$ of sizes $5{\times}31$ and $4{\times}15$ , respectively. Hence, we have $m_{t}{=}5$ and $m_{r}{=}4$ , which gives us a total number of $20$ measurements. For compressed sensing, we use the same number of measurements (i.e., $m=20$ ). An exhaustive search, on the other hand, requires $465$ measurements (our solution represents a measurement $95\%$ reduction), while the IEEE 802.11ad scheme requires $46$ measurements ( $57\%$ reduction).

For the source coding method, both the normalized MSE and probability of path misdirection results, shown in Fig. 7, depict that DNN-sd decoding has a slightly worse performance compared to the search method. This is a small sacrifice in performance that is traded for a huge advantage in processing speed. The IEEE 802.11ad’s method shows superior performance at low $E_{T}$ (below $0.1$ mJ). As $E_{T}$ increases, however, its performance seizes to improve, while our source coding solution keeps approaching perfect channel discovery. When examining the outage rate, in Fig. 7(c), we see that the relatively high MSE error and probability of path misdetection of 802.11ad, does not result in a significant degradation in $R_{\text{out}}$ . In fact, it has very close value to the perfect CSI capacity. Recall that 802.11ad requires almost twice the number of channel measurements. The compressed sensing method, on the other hand, has the lowest $R_{\text{out}}$ and the highest MSE and probability of path misdetection among all other schemes.

$23\times 23$ ** multi-path channels:** This is a more challenging multi-path scenario where, in addition to the previous solutions, we also investigate the performance of DNN decoding for which training is done with pure uncorrupted measurements. Source code selection: For this channel, since $n_{r}=n_{t}$ , the same source code works for designing both tx-precoders and rx-combiners. The perfect binary Golay code used as a syndrome source code is a suitable choice for this problem. It has a generator matrix of size $11{\times}22$ , hence, we have $m_{t}{=}m_{r}{=}11$ , and the total number of required channel measurements is $m_{t}{\times}m_{r}{=}121$ . Compared to the exhaustive search approach, which requires scanning all $529$ combinations of TX/RX angular directions, this represents $\boldsymbol{75\%}$ measurement reduction. For compressed sensing, we use the same number of precoders and combiners, as well.

First, for the probability of path detection, shown in Fig. 8, we notice very close performance for all three measurement decoding methods (search, DNN and DNN-sd) when integrated with our source coding solution. This suggests that DNNs are very efficient. And, while DNN-sd has a slight edge over DNN, the improvement is not significant. Hence, it is possible to user fewer DNNs trained over larger ranges of error components. Interestingly, however, in Fig. 10, we observe that the search decoding of our source-coding-based measurements tend to have significantly larger MSE. This indicates that DNNs tend to suppress the error components in the estimated channel values, even though the correct channel paths may not be efficiently discovered. This is an artifact of DNN training which suggests that we may be able to improve the DNN’s decoding performance if they were directly trained to discover the channel paths rather than just decrease the MSE of the channel estimates.

Compressed sensing, however has significantly worse performance in terms of path discover, MSE and outage Rate. At very low $E_{T}$ , CS’s performance improves as $E_{T}$ gets higher, but at $E_{T}\geq 0.7$ the improvement stops. This suggests that the CS-based solution is more sensitive to quantification error. This is verified by using a higher resolution ADC ( $b{=}7$ bits) which shows a significant improvement of performance.

$15\times 32$ ** multi-path channel:** The performance results under this scenario is very similar to the $23{\times}23$ channel setting shown above. Specifically, Fig. 11 shows the probability of path discovery where our coding method with search decoding shows superior performance compared to DNN-sd decoding. On the contrary, search has worse MSE compared to DNN-sd (as shown in Fig.13). Compressed sensing, has close performance to our proposed solution at low $E_{T}$ . At high $E_{T}$ , however, its performance only sees marginal improvement, unlike our proposed solution which keeps approaching perfect channel discovery. Code selection: We choose codes whose $m_{r}=11$ and $m_{t}=16$ . The total number of required channel measurements is $176$ , which constitutes $63.3\%$ measurement reduction compared to exhaustive search.

VIII-E Effect of ADC resolution on performance:

Now, we provide and compare results for $23{\times}23$ channels with ADCs of $b{=}3,5$ , and $7-$ bit resolution, inas well as the ideal $b={\infty}$ . We only show results for DNN-sd decoding since the performance of the other two methods compare similarly to the trends of the ideal ADC case shown above. In Fig. 14(a), we plot the MSE. As expected, we see that MSE is inversely proportional to resolution. We can also see that as SNR increases, higher resolution is required to keep the MSE close to that of ideal ADCs. For instance, $b{=}5$ is reasonably good up to SNR $=-5$ dB, while $b{=}7$ is very close to $b{=}\infty$ up to SNR $=5$ dB. Even at high SNR, the $b{=}7$ curve has a gap with ideal ADCs that is smaller than $5\times 10^{-3}$ . Similar performance trend is exhibited for the probability of path discovery, shown in Fig. 14(b). Finally, the outage rate is depicted in Fig. 14(c). We see that $b{=}7$ results in $R_{\text{out}}$ that is almost identical to that of the ideal ADC, and that both of which are very close to the perfect CSI capacity. We also see that at low SNR, there is a considerable gap between the perfect CSI capacity and outage rate even for ideal ADCs.

IX Conclusion

In this work, we treat the mmWave channel estimation problem a source compression problem. Our goal is to reconstruct the channel matrix using a small number of measurements. We exploit linear binary source codes for encoding the channel (do measurements) and a deep neural network based algorithm for measurement decoding (channel reconstruction). We are able, using a small number of measurements, to obtain high quality channel estimates. The lower bound on the achievable number of measurements is accurately characterized. Through simulation, the superiority of our proposed solution is demonstrated, in comparison to compressed-sensing-based solutions and IEEE 802.11ad’s beam alignment.

Appendix A Proof Of Lemma 2

Proof.

Consider a set of $n-$ dimensional linearly independent vectors, $\boldsymbol{v_{1}},\dots,\boldsymbol{v_{m}}$ defined over $\mathbb{F}_{2}$ . Then, construct a matrix $\boldsymbol{M}_{\mathbb{F}_{2}}$ whose columns are $\boldsymbol{v_{1}},\dots,\boldsymbol{v_{m}}$ . Since $\boldsymbol{v_{i}}$ ’s are independent, then $\boldsymbol{M}_{\mathbb{F}_{2}}$ has full column rank, i.e., $\boldsymbol{M}_{\mathbb{F}_{2}}$ is left-invertible over $\mathbb{F}_{2}$ ( $m{\leq}n$ ). Thus $\boldsymbol{M}_{\mathbb{F}_{2}}$ has an $m{\times}m$ minor, call it $\boldsymbol{A}_{\mathbb{F}_{2}}$ whose determinant is non-zero. Now consider the matrix $\boldsymbol{M}$ , defined over $\mathbb{C}$ , whose elements are the [math] and $1$ real scalars corresponding to $\mathrm{0}_{\mathbb{F}_{2}}$ and $\mathrm{1}_{\mathbb{F}_{2}}$ values of $\boldsymbol{M}_{\mathbb{F}_{2}}$ . Let $\boldsymbol{A}$ be its minor corresponding to $\boldsymbol{A}_{\mathbb{F}_{2}}$ of $\boldsymbol{M}_{\mathbb{F}_{2}}$ . By lemma 5 (in Appendix B), we have $\det\left(\boldsymbol{A}\right){\neq}0$ . Thus, $\boldsymbol{M}$ is also left-invertible, hence its columns are linearly independent. ∎

Appendix B Lemma 5

Lemma 5 .

*Let $\boldsymbol{A}_{\mathbb{F}}$ and $\boldsymbol{A}$ be $n{\times}n$ matrices defined over $\mathbb{F}_{2}$ and $\mathbb{R}$ , respectively. Let ${a_{\mathbb{F}}}_{i,j}$ , the elements of $\boldsymbol{A}_{\mathbb{F}}$ , be scalars in $\{\mathsf{0}_{\mathbb{F}_{2}},\mathsf{1}_{\mathbb{F}_{2}}\}$ , while $a_{i,j}$ the elements of $\boldsymbol{A}$ , be scalars in $\{0,1\}{\subseteq}\mathbb{R}$ . Suppose that $\boldsymbol{A}_{\mathbb{F}}$ has non-zero determinant, i.e., $\det\left(\boldsymbol{A}_{\mathbb{F}}\right){\neq}\mathsf{0}_{\mathbb{F}_{2}}$ . If we define $\boldsymbol{A}$ such that $a_{i,j}=0$ if ${a_{\mathbb{F}}}_{i,j}=\mathsf{0}_{\mathbb{F}_{2}}$ , and $a_{i,j}=1$ if ${a_{\mathbb{F}}}_{i,j}=\mathsf{1}_{\mathbb{F}_{2}}$ for all $1{\leq}i,j{\leq}n$ . Then, $\det\left(\boldsymbol{A}\right)\neq 0$ . *

Proof.

Recall that the determinant of a square matrix defined over a commutative ring is given by the Leibniz formula [38]. Since $\mathbb{F}_{2}$ is a finite field (with $2$ elements), it constitutes a commutative ring. Moreover, $\mathbb{R}$ is a commutative ring [38]. Therefore, both determinants of $\boldsymbol{A}_{\mathbb{F}}$ and $\boldsymbol{A}$ can be computed using the same exact formula. Since, finite field arithmetic over the prime field $\mathbb{Z}_{2}$ is the integers $modulo$ $2$ , then we can write $\det(\boldsymbol{A}_{\mathbb{F}})=\det(\boldsymbol{A})\mod 2$ . Thus, $\exists q\in\mathbb{Z}$ (the set of integers), such that $\det(\boldsymbol{A})=q\times 2+\det(\boldsymbol{A}_{\mathbb{F}})=q\times 2+1$ , were the latter equation follows from the fact that $\det(\boldsymbol{A}_{\mathbb{F}})\neq\mathsf{0}_{\mathbb{F}_{2}}\Longleftrightarrow\det(\boldsymbol{A}_{\mathbb{F}}){=}\mathsf{1}_{\mathbb{F}_{2}}$ . Therefore, $\det(\boldsymbol{A})$ is an odd integer, which implies that $\det(\boldsymbol{A}){\neq}0$ , concluding our proof. ∎

Appendix C Proof of Proposition 4

Proof.

We will prove that adding an extra column $\boldsymbol{p}\in\mathbb{R}^{m}$ to any full rank matrix $\boldsymbol{M}$ of size $m{\times}k$ with $m\leq k$ (i.e., $\text{rank}\left(\boldsymbol{M}\right){=}m$ ) does not reduce its singular values.

Let $\boldsymbol{M_{p}}=\begin{pmatrix}[c|c]\boldsymbol{M}&\boldsymbol{p}\end{pmatrix}$ be an $m{\times}k{+}1$ matrix. Then, we can obtain the singular values of $\boldsymbol{M_{p}}$ as the positive square roots of the eigenvalues of $\boldsymbol{M_{p}}\boldsymbol{M}^{T}_{\boldsymbol{p}}$ , which can be written as:

[TABLE]

Since $\boldsymbol{p}\boldsymbol{p}^{T}{\succeq}0$ (i.e., positive semidefinite), then we must have $\boldsymbol{M_{p}}\boldsymbol{M}^{T}_{\boldsymbol{p}}-\boldsymbol{M}\boldsymbol{M}^{T}\succeq 0$ . Let $\sigma_{i}\left(\cdot\right)$ denote the $i^{\text{th}}$ largest singular value of a matrix. Then, we have $\sigma_{i}\left(\boldsymbol{M_{p}}\boldsymbol{M}^{T}_{\boldsymbol{p}}\right)\geq\sigma_{i}\left(\boldsymbol{M}\boldsymbol{M}^{T}\right)\;\forall i=1,\dots,m$ , which implies

[TABLE]

Define $\boldsymbol{G}^{(i)}{\triangleq}\begin{pmatrix}[c|c|c|c]\boldsymbol{g}_{1}&\boldsymbol{g}_{2}&\dots&\boldsymbol{g}_{i}\end{pmatrix}$ , where $\boldsymbol{g}_{j}$ is the $j^{th}$ column of $\boldsymbol{G}$ . Then, by sequentially applying the result shown in Eq. (37) (by adding columns of $\boldsymbol{P}$ in Eq. (33)), we obtain

[TABLE]

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter Wave Mobile Communications for 5G Cellular: It Will Work!” IEEE Access , vol. 1, pp. 335–349, 2013.
2[2] Z. Pi and F. Khan, “An Introduction to Millimeter-Wave Mobile Broadband Systems,” IEEE communications magazine , vol. 49, no. 6, 2011.
3[3] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter Wave Channel Modeling and Cellular Capacity Evaluation,” IEEE journal on selected areas in communications , vol. 32, no. 6, 2014.
4[4] “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016 - 2021 White Paper,” Mar 2017. [Online]. Available: https://goo.gl/U 1e QNM
5[5] G. R. Mac Cartney, J. Zhang, S. Nie, and T. S. Rappaport, “Path Loss Models for 5G Millimeter Wave Propagation Channels in Urban Microcells,” in Globecom , 2013, pp. 3948–3953.
6[6] S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter-Wave Cellular Wireless Networks: Potentials and Challenges,” Proceedings of the IEEE , vol. 102, no. 3, pp. 366–385, 2014.
7[7] K. Haneda, J. Zhang, L. Tan, G. Liu, Y. Zheng, H. Asplund, J. Li, Y. Wang, D. Steer, C. Li et al. , “5G 3GPP-Like Channel Models for Outdoor Urban Microcellular and Macrocellular Environments,” in 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring) . IEEE, 2016, pp. 1–7.
8[8] S.-E. Chiu, N. Ronquillo, and T. Javidi, “Active learning and csi acquisition for mmwave initial alignment,” IEEE Journal on Selected Areas in Communications , vol. 37, no. 11, pp. 2474–2489, 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

Source Coding Based Millimeter-Wave Channel Estimation with Deep Learning Based Decoding

Abstract

Index Terms:

I Introduction

II Related Work

III Motivating Example

Remark*.*

IV System Model

IV-A Problem Formulation

V Source-Coding-Based Measurements

V-A Source Codes

Definition V.1** (Linear Source Code).**

V-B MmWave Beam Discovery

Definition V.2** (Support vector).**

Definition V.3** (Set of Non-Zero Indexes Xv\mathcal{X}_{\boldsymbol{v}}Xv​).**

Definition V.4**.**

Theorem 1 .

Proof.

Lemma 2 .

V-C Beamforming Design

Remark*.*

V-D On the lower bound on the number of measurements

Corollary 2.1 .

Proof.

V-E Channel Estimation Error

Proposition 3 .

Proof.

Proposition 4 .

Remark*.*

VI Measurement Decoding

VI-A DNN-based mapping

VI-B DNN Model Assessment

VII Multiple Transmit and Receive Antennas

VII-A Measurements

VII-B Decoding Ys\boldsymbol{Y}^{s}Ys

VIII Performance Evaluation

VIII-A Performance Metrics

VIII-B Implemented Solutions

VIII-C Equating Energy Consumption

VIII-D Results

VIII-E Effect of ADC resolution on performance:

IX Conclusion

Appendix A Proof Of Lemma 2

Proof.

Appendix B Lemma 5

Lemma 5 .

Proof.

Appendix C Proof of Proposition 4

Proof.

*Remark**.*

Definition V.1 (Linear Source Code).

Definition V.2 (Support vector).

Definition V.3 (Set of Non-Zero Indexes $\mathcal{X}_{\boldsymbol{v}}$ ).

Definition V.4.

*Remark**.*

*Remark**.*

VII-B Decoding $\boldsymbol{Y}^{s}$