Massive MIMO Performance - TDD Versus FDD: What Do Measurements Say?

Jose Flordelis; Fredrik Rusek; Fredrik Tufvesson; Erik G. Larsson; and; Ove Edfors

arXiv:1704.00623·cs.IT·April 4, 2017

Massive MIMO Performance - TDD Versus FDD: What Do Measurements Say?

Jose Flordelis, Fredrik Rusek, Fredrik Tufvesson, Erik G. Larsson, and, Ove Edfors

PDF

Open Access

TL;DR

This paper compares TDD and FDD Massive MIMO downlink beamforming strategies using real-world measurements, highlighting the practical performance differences and addressing ongoing industry debates.

Contribution

It provides an empirical comparison of TDD and FDD Massive MIMO beamforming methods based on actual channel measurements at 2.6 GHz.

Findings

01

TDD outperforms FDD in measured channel data.

02

Real-world performance differences are significant.

03

Results inform practical deployment choices.

Abstract

Downlink beamforming in Massive MIMO either relies on uplink pilot measurements - exploiting reciprocity and TDD operation, or on the use of a predetermined grid of beams with user equipments reporting their preferred beams, mostly in FDD operation. Massive MIMO in its originally conceived form uses the first strategy, with uplink pilots, whereas there is currently significant commercial interest in the second, grid-of-beams. It has been analytically shown that in isotropic scattering (independent Rayleigh fading) the first approach outperforms the second. Nevertheless there remains controversy regarding their relative performance in practice. In this contribution, the performances of these two strategies are compared using measured channel data at 2.6 GHz.

Tables2

Table 1. TABLE I: Summary of measured scenarios.

Table 2. TABLE II: Sum-capacity (in bits/s/Hz) for TDD at ρ = 0 𝜌 0 \rho=0 dB.

Scenario	1	2	3	4	5	6
Number of UEs	4	4	4	4	8	16
Sum-capacity	19.9	20.0	16.7	20.0	32.2	49.0

Equations72

H (ℓ) = [h_{1} (ℓ) \dots h_{K} (ℓ)]^{T}

H (ℓ) = [h_{1} (ℓ) \dots h_{K} (ℓ)]^{T}

y (ℓ) = ρ H (ℓ) s (ℓ) + n (ℓ),

y (ℓ) = ρ H (ℓ) s (ℓ) + n (ℓ),

E {s^{H} (ℓ) s (ℓ)} = 1,

E {s^{H} (ℓ) s (ℓ)} = 1,

ma x imi z e \displaylimits_{Λ (ℓ)}

ma x imi z e \displaylimits_{Λ (ℓ)}

tr (Λ (ℓ)) \leq ρ, Λ (ℓ) ⪰ 0,

\overset{ˉ}{C}_{TDD} (ρ) = \frac{1}{L} ℓ = 1 \sum L C_{TDD} (H (ℓ), ρ) .

\overset{ˉ}{C}_{TDD} (ρ) = \frac{1}{L} ℓ = 1 \sum L C_{TDD} (H (ℓ), ρ) .

c_{i} = \frac{1}{M} [1 e^{ π ψ_{i}} \dots e^{ π ψ_{i} (M - 1)}]^{T},

c_{i} = \frac{1}{M} [1 e^{ π ψ_{i}} \dots e^{ π ψ_{i} (M - 1)}]^{T},

C = [c_{1} \dots c_{M^{'}}] .

C = [c_{1} \dots c_{M^{'}}] .

g_{k} (ℓ) = C^{T} h_{k} (ℓ) .

g_{k} (ℓ) = C^{T} h_{k} (ℓ) .

\overset{˘}{g}_{k} (ℓ) = B_{k}^{T} (ℓ) g_{k} (ℓ),

\overset{˘}{g}_{k} (ℓ) = B_{k}^{T} (ℓ) g_{k} (ℓ),

\hat{h}_{k} (ℓ)

\hat{h}_{k} (ℓ)

\hat{H} (ℓ) = [\hat{h}_{1} (ℓ) \dots \hat{h}_{K} (ℓ)]^{T} .

\hat{H} (ℓ) = [\hat{h}_{1} (ℓ) \dots \hat{h}_{K} (ℓ)]^{T} .

p_{k} (ℓ) = z_{k} (ℓ) / ∥ z_{k} (ℓ)∥,

p_{k} (ℓ) = z_{k} (ℓ) / ∥ z_{k} (ℓ)∥,

SINR_{k} (H (ℓ), ρ) = \frac{\frac{ρ}{K} h _{k}^{T} ( ℓ ) p _{k} ( ℓ ) ^{2}}{1 + \frac{ρ}{K} \sum _{i \neq = k} h _{k}^{T} ( ℓ ) p _{i}^{X} ( ℓ ) ^{2}},

SINR_{k} (H (ℓ), ρ) = \frac{\frac{ρ}{K} h _{k}^{T} ( ℓ ) p _{k} ( ℓ ) ^{2}}{1 + \frac{ρ}{K} \sum _{i \neq = k} h _{k}^{T} ( ℓ ) p _{i}^{X} ( ℓ ) ^{2}},

\mathcal{C}_{\text{D-GOB}}(\boldsymbol{H}(\ell),\rho)=\sum_{k=1}^{K}\log_{2}\Big{(}1+\operatorname{SINR}_{k}\left(\boldsymbol{H}(\ell),\rho\right)\Big{)}.

\mathcal{C}_{\text{D-GOB}}(\boldsymbol{H}(\ell),\rho)=\sum_{k=1}^{K}\log_{2}\Big{(}1+\operatorname{SINR}_{k}\left(\boldsymbol{H}(\ell),\rho\right)\Big{)}.

Q_{k} (ℓ) arg min

Q_{k} (ℓ) arg min

Q_{k} (ℓ) \subset {1, \dots, M^{'}}, ∣ Q_{k} (ℓ) ∣ = N,

ma x imi z e \displaylimits_{Λ (ℓ)}

ma x imi z e \displaylimits_{Λ (ℓ)}

Λ (ℓ) ⪰ 0, tr (Λ (ℓ)) \leq ρ,

ma x imi z e \displaylimits_{B (ℓ) = [C]_{Q (ℓ)}}

ma x imi z e \displaylimits_{B (ℓ) = [C]_{Q (ℓ)}}

Q (ℓ) \subset {1, \dots, M^{'}}, ∣ Q (ℓ) ∣ = N .

s (ℓ) = B P (ℓ) x (ℓ), ℓ = 1, \dots, L,

s (ℓ) = B P (ℓ) x (ℓ), ℓ = 1, \dots, L,

ma x imi z e \displaylimits_{B = [C]_{Q}}

ma x imi z e \displaylimits_{B = [C]_{Q}}

Q \subset {1, \dots, M^{'}}, ∣ Q ∣ = N .

ma x imi z e \displaylimits_{{Λ (ℓ)}_{ℓ = 1}^{L}}

ma x imi z e \displaylimits_{{Λ (ℓ)}_{ℓ = 1}^{L}}

Λ (ℓ) ⪰ 0, tr (Λ (ℓ)) \leq ρ,

Q_{k} arg min

Q_{k} arg min

Q_{k} \subset {1, \dots, M^{'}}, ∣ Q_{k} ∣ = N,

δ_{ρ} := ρ^{*} / ρ .

δ_{ρ} := ρ^{*} / ρ .

Γ_{β} = {(r (m), m) : K \leq m \leq M},

Γ_{β} = {(r (m), m) : K \leq m \leq M},

r (m) = n : K \leq n \leq m, δ_{ρ} (n, m) \geq β arg min δ_{ρ} (n, m) .

r (m) = n : K \leq n \leq m, δ_{ρ} (n, m) \geq β arg min δ_{ρ} (n, m) .

N_{p} (N) = {K N N for D-GOB, H-GOB for D-SUB, H-SUB,

N_{p} (N) = {K N N for D-GOB, H-GOB for D-SUB, H-SUB,

\tilde{C}_{A} (ρ, T_{c}) = (1 - \frac{N _{p} ( N ^{*} )}{T _{c}}) \overset{ˉ}{C}_{A} (ρ, N^{*}),

\tilde{C}_{A} (ρ, T_{c}) = (1 - \frac{N _{p} ( N ^{*} )}{T _{c}}) \overset{ˉ}{C}_{A} (ρ, N^{*}),

N^{*} = 1 \leq N \leq 128 arg max (1 - \frac{N _{p} ( N )}{T _{c}}) \overset{ˉ}{C}_{A} (ρ, N) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Millimeter-Wave Propagation and Modeling · Antenna Design and Analysis

Full text

Massive MIMO Performance—TDD Versus FDD:

What Do Measurements Say?

Jose Flordelis, Fredrik Rusek, Fredrik Tufvesson, Erik G. Larsson, and Ove Edfors This work was supported by the Seventh Framework Programme (FP7) of the European Union under grant agreement no. 619086 (MAMMOET), ELLIIT—an Excellence Center at Linköping-Lund in Information Technology, the Swedish Research Council (VR), and the Swedish Foundation for Strategic Research (SSF).Jose Flordelis, Fredrik Rusek, Fredrik Tufvesson, and Ove Edfors are with the Department of Electrical and Information Technology, Lund University, SE-221 00 Lund, Sweden (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).E. G. Larsson is with the Department of Electrical Engineering (ISY), Linköping University, SE-581 83 Linköping, Sweden (e-mail: [email protected]).

Abstract

Downlink beamforming in Massive MIMO either relies on uplink pilot measurements—exploiting reciprocity and TDD operation, or on the use of a predetermined grid of beams with user equipments reporting their preferred beams, mostly in FDD operation. Massive MIMO in its originally conceived form uses the first strategy, with uplink pilots, whereas there is currently significant commercial interest in the second, grid-of-beams. It has been analytically shown that in isotropic scattering (independent Rayleigh fading) the first approach outperforms the second. Nevertheless there remains controversy regarding their relative performance in practice. In this contribution, the performances of these two strategies are compared using measured channel data at 2.6 GHz.

Index Terms:

Massive MIMO, FDD, TDD, performance, channel measurements.

I Introduction

The idea behind Massive MIMO is to equip base stations (BS) in wireless networks with large arrays of phase-coherently cooperating antennas. The use of such arrays facilitates spatial multiplexing of many user equipments (UEs) in the same time-frequency resource, and yields a coherent beamforming gain that translates directly into reduced interference and improved cell-edge coverage.

The original Massive MIMO concept [1, 2, 3, 4] assumes time-division duplexing (TDD) and exploits reciprocity for the acquisition of channel state information (CSI) at the BS. UEs send pilots on the uplink (UL); all UE-to-BS channels are estimated, and each antenna has its own RF electronics. The concept has, since its introduction a decade ago [1, 3], matured significantly: rigorous information-theoretic analyses are available [2], field-trials have demonstrated its performance in high-mobility scenarios [5, 6, 7], and circuit prototypes have shown the true practicality of implementations [8].

Concurrently, motivated by spectrum regulation issues, there is significant interest in developing frequency-division duplexing (FDD) versions of Massive MIMO [9, 10, 11, 12, 13]. There is also interest in hybrid beamforming architectures that rely on the use of analog phase shifters and signal combiners [14, 15, 16, 17], somewhat reminiscent of phased-arrays implementations of radar. With hybrid beamforming, the number of actual antennas may substantially exceed the number of RF chains.

FDD operation and hybrid beamforming solutions both bring the same difficulty – albeit for different reasons: significant assumptions on the structure of propagation must be made for the techniques to work efficiently. Specifically:

•

FDD operation requires CSI feedback from the UEs to the BS. Efficient encoding of this CSI is only possible if side information on the propagation is exploited. The resulting techniques are often called “grid-of-beams”, and have similarities to existing forms of multiuser (MU) MIMO in LTE [18].

•

Hybrid-beamforming architectures inherently rely on beamforming into predetermined spatial directions, as defined by the angle-of-arrival or angle-of-departure, seen from the array. Such directions only have a well-defined operational meaning when the propagation environment offers strong direct or specular paths [19].

There has been a long-standing debate on the relative performance between reciprocity-based (TDD) Massive MIMO and that of solutions based on grid-of-beams or hybrid-beamforming architectures. The matter was, for example, the subject of a heated debate in the 2015 Globecom industry panel “Massive MIMO vs FD-MIMO: Defining the next generation of MIMO in 5G” where on the one hand, the commercial arguments for grid-of-beams solutions were clear, but on the other hand, their real potential for high-performance spatial multiplexing was strongly contested [20]. It is known that grid-of-beams solutions perform poorly in isotropic scattering [21], but no prior experimental results are known to the authors.

The object of this paper is to conclusively answer this performance question through the analysis of real Massive MIMO channel measurement data obtained at the 2.6 GHz band. The conclusion, summarized in detail in Sec. VI, is that except for in certain line-of-sight (LOS) environments, the original reciprocity-based TDD Massive MIMO of [3, 1] represents the only feasible implementation of Massive MIMO at the frequency bands under consideration.

I-A Notation

We use the following notation throughout the paper: Boldface lowercase letters represent column vectors, and boldface uppercase letters represent matrices. Also, $\boldsymbol{I}$ is the identity matrix, $\lVert\boldsymbol{a}\rVert$ the Euclidean norm of vector $\boldsymbol{a}$ , $\mathrm{tr}\left({\boldsymbol{A}}\right)$ the trace of matrix $\boldsymbol{A}$ , $\operatorname{span}(\boldsymbol{A})$ its column space, ${\boldsymbol{A}}^{\operatorname{T}}$ denotes the transpose, ${\boldsymbol{A}}^{\operatorname{H}}$ the Hermitian transpose, $\left|\boldsymbol{A}\right|$ stands for the determinant, and $\boldsymbol{A}\succeq\boldsymbol{0}$ means that $\boldsymbol{A}$ is positive semidefinite. $\operatorname{diag}(\boldsymbol{a})$ builds a matrix having $\boldsymbol{a}$ along its diagonal and all other elements set to zero, $\begin{bmatrix}\boldsymbol{A}\mid\boldsymbol{b}\end{bmatrix}$ denotes the matrix resulting from appending $\boldsymbol{b}$ to $\boldsymbol{A}$ , and $[\boldsymbol{A}]_{\mathcal{I}}$ is the submatrix of $\boldsymbol{A}$ formed by choosing the columns of the index set $\mathcal{I}$ . The imaginary unit is denoted by $\jmath$ , $\mathcal{CN}(\boldsymbol{\mu},\boldsymbol{\Lambda})$ denotes the complex Gaussian distribution with mean $\boldsymbol{\mu}$ and covariance matrix $\boldsymbol{\Lambda}$ , $\mathbb{E}[\cdot]$ is the expectation operator, and $|\mathcal{I}|$ denotes the number of elements in the set $\mathcal{I}$ .

II System Model

We consider the downlink (DL) of a single-cell Massive MIMO system in which an $M$ -antenna BS communicates with $K$ single-antenna UEs in the same time-frequency resource. Orthogonal Frequency Division Multiplexing (OFDM) with $L$ subcarriers is assumed [22]. Let $\boldsymbol{h}_{k}(\ell)\in\operatorname{\mathbb{C}}^{M\times 1}$ , for $k=1,\ldots,K$ , and $\ell=1,\ldots,L$ , denote the channel vector between the BS and the $k^{\operatorname{th}}$ UE at the $\ell^{\operatorname{th}}$ subcarrier, and let

[TABLE]

denote the corresponding $K\times M$ channel matrix. Then, the normalized input-output relation of the channel can be written as

[TABLE]

where $\boldsymbol{y}(\ell)\in\operatorname{\mathbb{C}}^{K\times 1}$ is the vector containing the received signals of all the UEs, $\boldsymbol{s}(\ell)\in\operatorname{\mathbb{C}}^{M\times 1}$ the vector of precoded transmit signals satisfying

[TABLE]

$\rho$ the signal-to-noise ratio (SNR), and $\boldsymbol{n}(\ell)$ is a vector of $\mathcal{CN}(0,1)$ receiver noise at the UEs.

III Transmission Techniques

This section outlines the beamforming techniques included in the comparison—first, fully-digital reciprocity-based (TDD) beamforming in Sec. III-A, and then, four flavors of FDD beamforming based on feedback of CSI in Sec. III-B.

III-A Fully-Digital Reciprocity-Based (TDD) Beamforming

With fully-digital beamforming, no a priori assumptions are made on the propagation environment. There are no predetermined beams, but CSI is measured at the BS by observing UL pilots transmitted by the UEs. By virtue of TDD operation and reciprocity of propagation, the so-obtained UL CSI is also valid for the DL, assuming proper reciprocity calibration [23]. All signal processing takes place in the digital domain. A TDD beamforming system is schematically depicted in Fig. 1 for $K=2$ UEs.

With full CSI at the BS, TDD performs optimally and can achieve the DL sum-capacity by dirty-paper coding (DPC) [24]. For given $\rho$ , the sum-capacity of the $\ell^{\operatorname{th}}$ subcarrier, $\mathcal{C}_{\text{TDD}}(\boldsymbol{H}(\ell),\rho)$ , is given by the solution to the following optimization problem [25, 26, 27, 28]:

[TABLE]

where $\boldsymbol{\Lambda}(\ell)=\operatorname{diag}(\lambda_{1}(\ell),\ldots,\lambda_{K}(\ell))$ is a diagonal power allocation matrix. The sum-capacity averaged over all the subcarriers is then

[TABLE]

Problem (4) is convex and can be efficiently solved by a simple gradient search, or via a technique known as sum-power iterative waterfilling [29, 30].

III-B Feedback-Based FDD Beamforming with Predetermined Beams

Feedback-based beamforming relies on the reporting of quantized CSI from the UEs to the BS. Typically, CSI quantization is obtained by using a predetermined codebook consisting of $M^{\prime}$ beams, which imposes a certain structure on the precoded signals $\boldsymbol{s}(\ell)$ . These techniques may be applied when reliance on reciprocity is undesirable or impossible, notably in FDD operation.

We represent the $M^{\prime}$ beams through the set of $M$ -vectors $\{\boldsymbol{c}_{i}\}_{i=1}^{M^{\prime}}$ . Throughout this article, we assume that these beams are given by Vandermonde vectors comprising the array response in $M^{\prime}$ directions uniformly spaced in the sine-angle domain. More precisely, we define

[TABLE]

where $\psi_{i}=-1+\frac{2i-1}{M^{\prime}}$ , for $i=1,\ldots,M^{\prime}$ . We also define the $M\times M^{\prime}$ codebook matrix

[TABLE]

A special case of the codebook is when $M^{\prime}=M$ and the beams are orthonormal; then ${\boldsymbol{C}}^{\operatorname{H}}\boldsymbol{C}=\boldsymbol{I}$ . In this case, the vectors $\boldsymbol{c}_{i}$ are the columns of an $M\times M$ IDFT matrix, up to a constant shift of the origin of the phase angle $\psi_{i}$ .

The UEs report their preferred beams to the BS. There are several ways that this may be done, and we consider two cases:

Each UE individually reports the indices and complex gains of a predetermined number, $N\leq M^{\prime}$ , of beams.

2)

The BS, possibly based on interaction with the UEs, decides on a common set of $N$ beams that are simultaneously used for all the UEs. Then, each UE reports the complex gains of these $N$ beams.

The structure imposed by the predetermined codebook of beams may be implemented either in the digital domain, or in the analog domain:

(a)

If implemented in the digital domain, the selection of the beams may be performed individually for each subcarrier.

(b)

In contrast, if implemented in the analog domain, the same set of beams must be used for the entire band.

The combination of 1 and 2, respectively (a) and (b) above, yields four cases of interest, illustrated in Fig. 2 for $K=2$ single-antenna UEs and $N=2$ reported beams. These four cases are described in detail in the next four subsections. Throughout this article, we assume that for every subcarrier each UE can acquire its vector of complex gains perfectly. We further assume that feedback channels are delay- and error-free.

Digital Grid-of-Beams (D-GOB)

Each UE individually reports the indices and complex gains of a number, $N$ , of beams. The selection and reporting of the beams is done independently for each subcarrier. This corresponds to combination 1a above.

Let us compute the achievable sum-rate of D-GOB, $\bar{\mathcal{C}}_{\text{D-GOB}}(\rho)$ , averaged over all the subcarriers. Each UE learns the vector of complex gains

[TABLE]

of the $M^{\prime}$ predetermined beams. It then selects $N$ beams, according to some criterion that will be shortly explained, and forms the set $\mathcal{Q}_{k}(\ell)$ of selected beam indices. Then, each UE reports $\mathcal{Q}_{k}(\ell)$ and the vector $\breve{\boldsymbol{g}}_{k}(\ell)$ of associated complex gains to the BS. By construction, we have that

[TABLE]

where the $M\times N$ matrix $\boldsymbol{B}_{k}(\ell)$ is obtained by extracting the relevant beams from $\boldsymbol{C}$ , as dictated by $\mathcal{Q}_{k}(\ell)$ . Accordingly, the BS may produce a quantized version $\hat{\boldsymbol{h}}_{k}(\ell)$ of $\boldsymbol{h}_{k}(\ell)$ , as given by the expression

[TABLE]

With D-GOB, multiuser interference is only partially known. Given $i\neq j$ , the sets $\mathcal{Q}_{i}(\ell)$ and $\mathcal{Q}_{j}(\ell)$ produced by UEs $i$ and $j$ may be different, but the BS can only deal with interference in $\mathcal{Q}_{i}(\ell)\cap\mathcal{Q}_{j}(\ell)$ . It follows that DPC is not feasible in this setting. Instead, zero-forcing (ZF) based on the quantized channels $\hat{\boldsymbol{h}}_{k}(\ell)$ is commonly used as the multiuser transmission strategy [31, 32]. To apply ZF, one can define the quantized channel matrix

[TABLE]

Then, from [33], the columns of the ZF precoding matrix, $\boldsymbol{P}(\ell)$ in Fig. 2, can be computed as

[TABLE]

where $\boldsymbol{z}_{k}(\ell)$ are the columns of the Moore-Penrose pseudoinverse ${\hat{\boldsymbol{H}}}^{\dagger}(\ell)$ of $\hat{\boldsymbol{H}}(\ell)$ . If equal power $\rho/K$ is allocated to each UE, the receive SINR of the $k^{\operatorname{th}}$ UE can be written as

[TABLE]

from which the achievable sum-rate is computed as [33]

[TABLE]

The sum-rate averaged over all the subcarriers, $\bar{\mathcal{C}}_{\text{D-GOB}}(\rho)$ , is then defined similar to (5). Note that even though the precoders $\boldsymbol{P}(\ell)$ are designed according to the ZF principle, the multiuser cross-talk terms $\left|{\boldsymbol{h}}^{\operatorname{T}}_{k}(\ell)\boldsymbol{p}_{i}(\ell)\right|^{2}$ , $i\neq k$ , in the denominator of (12) do not vanish in general. In fact, precoding that completely suppresses interference is impossible here since complete CSI cannot be obtained at the BS, unless $N=\min(M^{\prime},M)$ .

Next, we briefly discuss the problem of beam selection by the UEs, which we formulate as the solution to the following optimization problem [29, 32, 16]:

[TABLE]

where $\hat{\boldsymbol{h}}_{k}(\ell)$ depends on $\mathcal{Q}_{k}(\ell)$ through $\boldsymbol{B}_{k}(\ell)$ as given by (10). Generally, (14) is a hard combinatorial problem, and can be solved exactly only for fairly small values of $N$ . (A special case is when ${\boldsymbol{C}}^{\operatorname{H}}\boldsymbol{C}=\boldsymbol{I}$ , in which case one simply needs to pick the $N$ strongest entries in the vector $\boldsymbol{g}_{k}(\ell)$ defined by (8).) Because of this, a heuristic rather than optimal algorithm to solve (14) is favored in this work. For the particulars on the algorithm, the reader is referred to Appendix -A.

Digital Subspace Beamforming (D-SUB)

The BS, possibly based on interaction with the UEs, decides on a common set of $N$ beams that are used for all the UEs. Beams are selected independently for each subcarrier. Thus, we have combination 2a.

We seek to find a beamfoming matrix $\boldsymbol{B}(\ell)$ , formed from the columns of $\boldsymbol{C}$ , such that the resulting channel $\boldsymbol{H}(\ell){\boldsymbol{B}}(\ell)$ maximizes the sum-rate for given $\rho$ . Let $\mathcal{C}_{\text{D-SUB}}(\boldsymbol{H}(\ell),\rho)$ denote the optimal sum-rate. The structure of D-SUB beamforming is shown in Fig. 2. Clearly, the precoder $\boldsymbol{P}(\ell)$ needs to be designed jointly with $\boldsymbol{B}(\ell)$ . For this, we adopt a two-step approach. First, we address the problem of designing $\boldsymbol{P}(\ell)$ when $\boldsymbol{B}(\ell)$ and $\rho$ are given. Then, we return to the original problem of jointly designing $\boldsymbol{P}(\ell)$ and $\boldsymbol{B}(\ell)$ for given $\rho$ , and apply the results of the first step.

For given $\boldsymbol{B}(\ell)$ and $\rho$ , let $\mathcal{C}_{\text{BC}}(\boldsymbol{H}(\ell)\boldsymbol{B}(\ell),\rho)$ denote the maximum sum-rate over $\boldsymbol{H}(\ell)\boldsymbol{B}(\ell)$ . It is shown in Appendix -B that $\mathcal{C}_{\text{BC}}(\boldsymbol{H}(\ell)\boldsymbol{B}(\ell),\rho)$ can be found as the solution to the optimization problem

[TABLE]

where $\boldsymbol{\Lambda}(\ell)=\operatorname{diag}\left(\lambda_{1}(\ell),\ldots,\lambda_{K}(\ell)\right)$ is a diagonal power allocation matrix, and $\boldsymbol{U}(\ell)$ is an $M\times N$ matrix such that $\boldsymbol{B}(\ell)=\boldsymbol{U}(\ell)\boldsymbol{L}(\ell)$ with ${\boldsymbol{U}}^{\operatorname{H}}(\ell)\boldsymbol{U}(\ell)=\boldsymbol{I}$ , and $\boldsymbol{L}(\ell)$ an invertible matrix. If one defines the effective channel matrix $\tilde{\boldsymbol{H}}(\ell)=\boldsymbol{H}(\ell)\boldsymbol{U}(\ell)$ , problem (15) is formally identical to (4), and hence can be solved efficiently. The optimal precoder $\boldsymbol{P}(\ell)$ for given $\boldsymbol{B}(\ell)$ and $\rho$ is defined by the set of covariance matrices $\left\{\boldsymbol{Q}_{i}\right\}_{i=1}^{K}$ , which are found by (i) obtaining the effective covariance matrices $\{\tilde{\boldsymbol{Q}_{i}}(\ell)\}_{i=1}^{K}$ from the power allocations $\left\{\lambda_{i}(\ell)\right\}_{i=1}^{K}$ in (15) via the so-called “MAC-to-BC” transformation (described in, e.g., [26, 29]); and (ii) computing $\boldsymbol{Q}_{i}(\ell)=\boldsymbol{L}^{-1}(\ell)\tilde{\boldsymbol{Q}_{i}}(\ell)\left({\boldsymbol{L}}^{\operatorname{H}}(\ell)\right)^{-1}$ , $i=1,\ldots,K$ .

Returning to our original problem, we can now express $\mathcal{C}_{\text{D-SUB}}(\boldsymbol{H}(\ell),\rho)$ as the solution to the optimization problem

[TABLE]

Put in words, for each subcarrier, the sum-rate as given by (15) is maximized over all $M\times N$ beamformers $\boldsymbol{B}(\ell)$ generated by codebook $\boldsymbol{C}$ . The sum-rate averaged over all the subcarriers, $\bar{\mathcal{C}}_{\text{D-SUB}}(\rho)$ , is then defined similar to (5).

Although, in principle, one could attempt the maximization in (16) by exhaustive search, solving (15) at each step, the number of beamformers $\boldsymbol{B}(\ell)$ that needs to be checked with this approach is $M^{\prime}\choose N$ . Thus, for values of $M^{\prime}$ in the hundreds or larger, the above direct approach appears intractable, except for very small $N$ . Therefore, alternative methods for solving (16) are needed. An efficient algorithm for approximate solution of (16) is presented in Appendix -C.

Hybrid Subspace Beamforming (H-SUB)

The BS, possibly based on interaction with the UEs, decides on a common set of $N$ beams to service all the UEs. In contrast to D-SUB, this choice is applied across all subcarriers, thereby facilitating the implementation of the beamforming in analog hardware. This corresponds to combination 2b above.

The hybrid beamforming architecture is shown in Fig. 2. The vector of precoded transmit signals, $\boldsymbol{s}(\ell)$ , has the form

[TABLE]

where $\boldsymbol{x}(\ell)$ is a vector containing the information bits from the UEs satisfying $\mathbb{E}\left\{{\boldsymbol{x}(\ell){\boldsymbol{x}(\ell)}^{\operatorname{H}}}\right\}=\boldsymbol{I}$ . Importantly, the precoder $\boldsymbol{P}(\ell)$ is frequency-selective, but the beamforming matrix $\boldsymbol{B}$ is not. Hence, $\boldsymbol{B}$ can be realized entirely by analog hardware. An important consequence is that the number of required RF chains at the BS can be reduced from $M$ (i.e., one RF chain per antenna element) to $N$ (i.e., one RF chain per selected beam).

To obtain a cost-effective analog beamforming network, a certain structure is typically enforced on the matrix $\boldsymbol{B}$ . In this work, we require that $\boldsymbol{B}$ be formed from the columns of the codebook matrix $\boldsymbol{C}$ defined by (7). Under this constraint, the analog beamforming network defined by $\boldsymbol{B}$ can be realized by using $N$ phase shifters, and $M$ $N$ -input signal combiners, as depicted in Fig. 2. Other constraints on $\boldsymbol{B}$ leading to simplifications of the analog hardware are possible; the reader if referred to [19, 34] for a comprehensive survey of the field.

Optimal beam selection for H-SUB is analogous to D-SUB, except that beams are reused for all subcarriers. For given $\rho$ , the sum-capacity averaged over all subcarriers, $\bar{\mathcal{C}}_{\text{H-SUB}}\left(\rho\right)$ , can be found as the solution to the optimization problem

[TABLE]

where $\bar{\mathcal{C}}_{\text{H-SUB}}\left(\{\boldsymbol{H}(\ell)\boldsymbol{B}\}_{\ell=1}^{L},\rho\right)$ is in turn defined as the solution to

[TABLE]

where, as usual, $\boldsymbol{\Lambda}(\ell)=\operatorname{diag}(\lambda_{1}(\ell),\ldots,\lambda_{K}(\ell))$ are power allocation matrices, and $\boldsymbol{B}=\boldsymbol{U}\boldsymbol{L}$ with ${\boldsymbol{U}}^{\operatorname{H}}\boldsymbol{U}=\boldsymbol{I}$ , and $\boldsymbol{L}$ an invertible matrix. Again, the efficient algorithm proposed in Appendix -C can be used to solve (17).

Hybrid Grid-of-Beams (H-GOB)

Last, we have combination 1b, wherein similar to D-GOB, each UE individually reports the indices and complex gains of $N$ beams, but wherein the choice of the beams is applied across all subcarriers. This strategy enables the implementation of the beamforming in analog hardware, as illustrated in Fig. 2. A special case is when $N=1$ , and additionally one dispenses with all the digital signal processing. This case is sometimes referred to as analog-only beamforming, and is used in communication standards such as IEEE 802.11ad [35]. The problem of beam selection can be posed as the following optimization problem:

[TABLE]

where $\hat{\boldsymbol{h}}_{k}(\ell)$ is given by (10). The heuristic algorithm in Appendix -A (with minor modifications) is proposed for solving (19).

IV Measured Channels

The measured channels were obtained in two different measurement campaigns conducted at the Faculty of Engineering (LTH) of Lund University, Lund, Sweden. At the BS side, a virtual uniform linear array (ULA) with 128 elements was used. The ULA spans 7 meters, and uses vertically-polarized, omnidirectional-in-azimuth antenna elements [36]. At the UE side, vertically-polarized omnidirectional antennas of the same type were used. The measurements were acquired at a carrier frequency of 2.6 GHz, and a bandwith of 50 MHz. A brief description of the two campaigns and the scenarios follows:

•

Campaign A. The UEs were located at the parking place outside the E-building of LTH, with the ULA mounted on top of the E-building, three floors above ground level. We consider five UE sites, denoted MS 1, …, MS 5. Sites MS 1 to MS 4 have mainly LOS propagation conditions to the BS, while site MS 5 experiences NLOS. At each site, several UE locations are measured. In this work, we consider three propagation scenarios, which are summarized in Table I as scenarios 1, 2, and 3. For further details on Campaign A, the reader is referred to [5].

•

Campaign B. The UEs were located in a courtyard of the E-building. The ULA was on a roof two floors above ground, while the 16 UEs were spread out at various positions in the courtyard. In this environment, the UEs experience LOS propagation conditions to the array, along with a number of strong scattered components caused by interactions with the walls, outdoor furniture, and vegetation. (The Ricean $K$ -factor [37, 38] is low compared to scenarios 1 and 3.) In this work, we consider three propagation scenarios, which are summarized in Table I as scenarios 4, 5, and 6. For further details on Campaign B, the reader is referred to [39].

We should also mention that, prior to applying DL beamforming as described in Sec. III, the measured channels are normalized to have unit average gain. This normalization step removes differences in path loss among UEs, while preserving variations across frequencies and antenna positions.

V Results and Discusion

Based on the measured channels obtained from Campaign A and Campaign B, we compare the performance of the five beamforming techniques described in Sec. III, namely, TDD beamforming, and four flavors of FDD beamforming: D-GOB, H-GOB, D-SUB, and H-SUB. Because TDD performs optimally, it serves as baseline. First, in Sec. V-A, we study how much one can reduce the number, $N$ , of reported beams in FDD beamforming while still retaining a prescribed fraction of the sum-capacity. Next, in Sec. V-B, we fix the FDD sum-rate to a desired value and address the following question: “Given $N$ , what is the average SNR loss relative to optimal TDD?” Then, in Sec. V-C, we investigate the tradeoff between RF chains and BS antennas in FDD beamforming, subject to a sum-rate constraint. In Sec. V-D, we reevaluate the findings of Sec. V-A, but now including the overhead of DL training. Last, in Sec. V-E, we make a remark about analog-only beamforming.

In the preparation of the results reported below, the following parameter settings were used. There are $M=128$ antennas at the BS, which communicate with $K=4,8$ , and $16$ single-antenna UEs, depending on the particular scenario. Evaluations are done based on $L=71$ subcarriers equispaced over a 50 MHz bandwidth, for which flat-frequency fading can be assumed. For each of the four considered FDD beamforming schemes, the “best” $N$ beams (in the sense described in Sec. III-B) are selected from a codebook of size $M^{\prime}=512$ , with $N$ in the range from $K$ to 128. In Sec. V-A and Sec. V-D, we choose $\rho=0$ dB. With this choice the per-UE spectral efficiencies are in the range 0.5–5.0 bits/s/Hz, which is representative of several wireless standards [18, 40]. Additionally for Sec. V-C, $m$ -antenna subarrays, $K\leq m\leq M$ , are considered. For each $m$ , several $m$ -antenna subarrays are selected so as to span the full length of the original $M$ -antenna array.

V-A Relative Sum-rate as a Function of $N$

First, we examine scenarios 1, 2, and 3, for which $K=4$ UEs. Fig. 3 (left half) shows the relative sum-rates $\bar{c}_{\text{A}}(\rho,N)=\bar{\mathcal{C}}_{\text{A}}(\rho,N)/\bar{\mathcal{C}}_{\text{TDD}}(\rho)$ , where A is one of “D-GOB”, “H-GOB”, “D-SUB”, and “H-SUB”. The sum-capacities $\bar{\mathcal{C}}_{\text{TDD}}(\rho)$ are given in Table II. For fixed $N$ , we say that A outperforms B if $\bar{c}_{\text{A}}(\rho,N)>\bar{c}_{\text{B}}(\rho,N)$ , where A and B may be applied to different scenarios.

With one exception111In scenario 3, the relative sum-rate of H-GOB decreases slightly when $N$ goes from 1 to 2. This can happen because ZF is used based on partial CSI., the relative sum-rates $\bar{c}_{\text{A}}(\rho,N)$ increase with increasing values of $N$ . At $N=128$ , D-SUB and H-SUB reach the sum-capacity, and D-GOB and H-GOB attain the sum-rate of ZF with perfect CSI. In general, D-GOB extracts a larger share of the sum-capacity than H-GOB, and D-SUB extracts a larger share than H-SUB. This must be so since with D-GOB and D-SUB, beams are selected individually for each subcarrier, while with H-GOB and H-SUB, the same set of beams is used for the entire band. The horizontal gap between the curves of D-GOB and H-GOB, and between those of D-SUB and H-SUB, represents the penalty due to the frequency selectivity of the channel, in terms of the number of additional beams needed. At 70% of the sum-capacity, this penalty is at most one beam for scenarios 1 and 3, and between 4 to 17 beams for scenario 2. These penalties are significantly larger for NLOS scenarios than for LOS ones, which can be explained by the larger frequency selectivity of NLOS channels [41].

Looking at scenarios 1 and 2, we note that D-GOB outperforms D-SUB. With $N=4$ , D-GOB can reach 82% of the sum-capacity, but D-SUB can only reach 72%; with $N=10$ , the relative sum-rates are 90% and 86%, respectively. In fact, this holds for all $N$ , although the gap closes as $N$ increases. This is somewhat surprising as one would expect that DPC should outperform ZF. The explanation is as follows. With D-GOB, beams are individually selected by each UE, with the goal of maximizing the channel gain. With D-SUB, however, channel beamforming gains are traded off against lower multiuser interference. When the channel propagation conditions are somewhat favorable (e.g., distinct LOS directions as in scenario 1, or NLOS propagation as in scenario 2), maximizing the channel beamforming gain is the better strategy. The relative performance of D-GOB and D-SUB depends in general on $\rho$ : For all $N$ , $\bar{c}_{\text{D-SUB}}(\rho,N)$ goes to 1 in the limit $\rho\to\infty$ , with the difference between $\mathcal{C}_{\text{TDD}}(\rho)$ and $\mathcal{C}_{\text{D-SUB}}(\rho,N)$ constant [42, 43]. Meanwhile, for interference-limited D-GOB we have that $\bar{c}_{\text{D-GOB}}(\rho,N)$ must go to 0 as $\rho\to\infty$ , if $N<128$ , and to 1, if $N=128$ .

The situation is more involved with regards to H-GOB and H-SUB: H-GOB beats H-SUB in scenario 1, and the opposite is true in scenario 2. This hints to a larger sensitivity to frequency selectivity of ZF compared to DPC.

Turning to scenario 3, we observe that D-SUB and H-SUB vastly outperform D-GOB and H-GOB. Addressing multiuser interference is crucial in this case, where the UEs are co-located and have LOS, and failure to do so leads to large performance losses. An interesting conclusion thus far is that there is no single FDD beamforming technique, D-GOB or D-SUB, H-GOB or H-SUB, that is “best” in all cases, but which technique that is most appropriate depends largely on the propagation scenario. We also make the obvious remark that if one desires to operate with $N<K$ beams, then D-GOB and H-GOB are the only available choices.

We now move on to scenarios 4, 5, and 6, with $K=4,8$ , and 16 UEs, respectively, facing LOS propagation conditions with strong scattered components. Shown in Fig. 3 (right half) are the relative sum-rates $\bar{c}_{\text{A}}(\rho,N)$ . The sum-capacities $\bar{\mathcal{C}}_{\text{TDD}}(\rho)$ are given in Table II.

An important observation is that the presence of significant scatterers in the propagation environment has a notable impact on the performance of D-GOB, H-GOB, D-SUB, and H-SUB. To see this, compare in Fig 3 the reported values of $N$ for scenario 1 with those of scenario 4. In addition to the LOS component, a substantial part of the received power in scenario 4 originates from scattered components, and more beams are needed to achieve a prescribed fraction of the sum-capacity.

We also note that the required number of beams, $N$ , increases with the number of active UEs, $K$ . That $N$ should grow with $K$ is consistent with the conventional Massive MIMO wisdom that the number of BS antennas (here, beams) should grow proportional to $K$ [44]—this is also necessary for D-SUB and H-SUB, for which $N\geq K$ must hold. The scalability of FDD Massive MIMO as $K$ grows is ultimately limited by the number of beams that can be learnt and reported, regardless of how many antennas are added to the system. In practical systems, where this number is typically small, the usefulness of FDD beamforming is limited to serving a small number of UEs.

From the above discussion, it should be clear that the performance of D-GOB, H-GOB, H-SUB and D-SUB is greatly influenced by the characteristics of the propagation scenario. In particular, LOS propagation conditions with large Ricean factors seem necessary to achieve reasonably good performance for small $N$ . By contrast, TDD Massive MIMO offers high performance across a variety of propagation scenarios. In particular, LOS propagation is not required. This distinguishing feature of TDD beamforming underlines the value of fully-digital precoding and reciprocity-based CSI acquisition: With measured channels and no structural limitations on the precoded signals, NLOS channels are as good as LOS channels (cf. scenarios 1 and 2 in Table II).

V-B Required $N$ for a Maximum SNR Loss

To obtain additional insights, we fix the sum-rate to a desired value, $C^{\ast}$ , and investigate the impact of varying $N$ , the number of reported beams. The required $N$ will depend on $C^{\ast}$ , and on the system SNR, $\rho$ . Given $C^{\ast}>0$ and $N\leq 128$ , it is immediate that one must use $\rho\geq\rho^{\ast}$ , with $\rho^{\ast}$ being the required SNR of TDD at $C^{\ast}$ . We define the SNR loss $\delta_{\rho}$ by the expression

[TABLE]

Shown in Fig. 4 (left half) is the required number of beams, $N$ , as a function of the maximum allowable SNR loss, for $C^{\ast}=12$ bits/s/Hz, and for scenarios 1, 2 and 3. In general, $N$ increases sharply with decreasing SNR loss. In scenario 1, D-GOB is more efficient than D-SUB, and H-GOB is more efficient than H-SUB. At 3 dB SNR loss, D-GOB, H-GOB, D-SUB and H-SUB require 3, 4, 6, and 7 beams, respectively. If 6 dB SNR loss is allowed, D-GOB can operate with $N=1$ beam, and similarly for H-GOB. On the other hand, in scenario 3, D-SUB and H-SUB greatly outperfom D-GOB and H-GOB. In fact, neither D-GOB nor H-GOB can operate at less than 3 dB SNR loss, regardless of $N$ . In scenario 2, none of the four investigated techniques can operate at low SNR loss with small $N$ : At 3 dB SNR loss, all of them require $N>20$ .

Shown in Fig. 4 (right half) is $N$ versus the allowable SNR loss, for $C^{\ast}=12,24$ and 48 bits/s/Hz and $K=4,8,$ and 16 UEs as obtained from scenarios 4, 5 and 6, respectively. The required $N$ increases rapidly with $K$ . For a large range of the SNR loss, D-GOB outperforms D-SUB, and H-GOB outperforms H-SUB.

V-C Tradeoff of Antennas versus RF Chains

We next address the following question: Given a system with $N^{\prime}$ RF chains and $M$ antennas, $M\geq N^{\prime}$ , to which extent can one compensate for a reduction of $N^{\prime}$ by increasing $M$ ? For that, we consider the level curves $\Gamma_{\beta}$ of the SNR loss function $\delta_{\rho}$ for some fixed sum-rate $C^{\ast}$ given by (20). The parameter $\beta$ represents the maximum allowable SNR loss. More explicitly, we define

[TABLE]

with the mapping

[TABLE]

Here, $\delta_{\rho}(n,m)$ is the SNR loss, as defined by (20), of a system with $n$ RF chains and $m$ antennas with respect to TDD with 128 antennas.

Let $\beta\in\{1,3,6,9,12\}$ dB. Fig. 5 shows the corresponding level curves for H-SUB, $C^{\ast}=12$ bits/s/Hz and scenarios 1, 2 and 3. For each $\beta$ , there exists a fully-digital system of minimal size $m^{\ast}$ (thus fulfilling $N^{\prime}=m=m^{\ast}$ ). The system is minimal in the sense that $m$ cannot be further reduced without violating the SNR loss requirement, $\beta$ . For example, in scenario 1, if $\beta=1$ dB, then $m^{\ast}=100$ ; but if $\beta=3$ dB, then $m^{\ast}=69$ . From Fig. 5, there exists a multiplicity of hybrid systems for which $\beta$ is upheld (thus fulfilling $N^{\prime}<m^{\ast}\leq m$ ). Furthermore, all those systems can be reached by starting from $(m^{\ast},m^{\ast})$ and moving to the left along the relevant level curve. For example, in scenario 1 and under $\beta=1$ dB, it is possible to travel from the point $(100,100)$ to the point $(76,100)$ , essentially reducing the number of RF chains by 24 at no additional cost. To further reduce $N^{\prime}$ , one must traverse the segment $(76,100)-(76,103)-(56,103)$ , which implies that 20 RF chains can additionally be saved by spending another 3 antennas. One can proceed in this way until the point $(21,128)$ is reached. Observe that saving RF chains becomes more and more expensive along the way, i.e., as $N^{\prime}$ turns smaller.

The situation looks quite different for propagation scenario 2. In particular, the level curves are notably steeper. The level curve under $\beta=1$ dB is given by the segment $(100,100)-(91,100)-(91,103)-(88,103)-\ldots-(71,128)$ : A maximal saving of 29 RF chains can be obtained by spending 28 antennas. It is not obvious that the resulting $(71,128)$ hybrid system is cheaper to realize than the original $(100,100)$ system. In stark contrast, the level curves of scenario 3 are close to horizontal, suggesting that drastic reductions in the number of RF chains are possible. For example, the level curve under $\beta=1$ dB starts at $(116,116)$ and ends at $(6,128)$ . In other words, 110 RF chains can be saved by merely adding 12 antennas.

V-D The Impact of DL Training Overhead

We next illustrate the performances of the different transmission schemes when the training overhead is taken into account. We assume a simple block-fading model, where the channel is constant for $T_{\text{c}}$ samples. Typically, $T_{\text{c}}$ is the length (time-bandwidth product) of the coherence interval of the channel, and ranges from just above one to a few hundred, depending on the carrier frequency, the richness of the channel (multipath), and the relative motion of the BS, UEs, and scatterers (Doppler). As an illustrative value, $T_{\text{c}}=200$ corresponds to, e.g., a coherence time of 1 ms and a coherence bandwidth of 200 kHz. We assume that $N_{\text{p}}$ DL pilot symbols are inserted within each coherence interval, leaving $T_{\text{c}}-N_{\text{p}}$ symbols available for data. For D-SUB and H-SUB, $N_{\text{p}}\geq N$ pilot symbols are needed to learn the channel.222This is true after the $N$ beams have been selected. Optimal beam selection requires that the entire “beam space” is observed, implying $N_{\text{p}}=128$ . Nonetheless, $N_{\text{p}}=N$ holds approximately if one assumes that the structure of the beam space changes much more slowly than the particular coefficients of the beams. That is, if one assumes that the length of the stationarity regions of the channel is much larger than the length of the coherence interval [45]. For D-GOB and H-GOB, we have that $N_{\text{p}}\geq\alpha N$ , where $\alpha$ ranges from $\alpha=1$ , if all the UEs report the same beams, to $\alpha=K$ , if the UEs report distinct beams. Here, we consider the worst case $\alpha=K$ . Thus, we let

[TABLE]

and compute the sum-rate $\tilde{\mathcal{C}}_{\text{A}}(\rho,T_{\text{c}})$ achievable over a large number of fading blocks (see [46]) by the formula

[TABLE]

where the average sum-rates $\bar{\mathcal{C}}_{\text{A}}(\rho,N)$ can be inferred from Fig. 3, and the quantity $N^{\ast}$ is defined by

[TABLE]

From (25), $N^{\ast}$ is the optimal number of beams to be activated: If $N<N^{\ast}$ , the degrees of freedom of the channel are underused, whereas if $N>N^{\ast}$ , too few symbols are left available for data.

The following example demonstrates that when the overhead of DL training is properly accounted for, D-GOB and D-SUB can nevertheless extract a sizable share of the sum-capacity of LOS channels. In NLOS conditions, however, these techniques do not work as well.

Example 1

Let $\rho=0$ dB, and let $T_{\text{c}}=1,2,\ldots,200$ . Fig. 6 shows $\tilde{\mathcal{C}}_{\text{A}}(\rho,T_{\text{c}})$ relative to optimal TDD, and the sum-rate of $4\times 4$ MU-MIMO. D-GOB and D-SUB perform several times better than conventional MU-MIMO, with D-SUB consistently outperforming D-GOB. D-GOB performs poorly if UEs are co-located with LOS, and none of them works well in NLOS. The associated values of $N^{\ast}$ are shown in Fig. 7. Observe that as $T_{\operatorname{c}}$ increases, more beams should be activated. As the UEs may report distinct beams, DL training with D-GOB is more expensive, and $N^{\ast}$ is thus pushed towards zero.

In the next example, we examine the optimal number of active beams, $N^{\ast}$ , with H-GOB and H-SUB. It is shown that, in LOS conditions, H-GOB and especially H-SUB perform reasonably well when operated with a small excess of RF chains, i.e., $N=K+2$ , or so.

Example 2

Let $\rho=0$ dB, and let $T_{\text{c}}=200$ . Fig. 8 shows $\left(1-\frac{N_{\text{p}}(N)}{T_{\text{c}}}\right)\bar{\mathcal{C}}_{\text{A}}(\rho,N)$ relative to optimal TDD as a function of $N$ . For H-GOB, it is optimal to activate 4, 15, and 9 beams in scenarios 1, 2 and 3, respectively. For H-SUB, the numbers are 18, 38, and 10. In fact, in LOS scenarios activating $K+2=6$ beams results in losses smaller than 10% of the relative sum-rate at $N^{\ast}$ . In NLOS scenarios, losses at $K+2$ beams surge to 20–40% of an already much diminished peak relative sum-rate.

V-E On the Performance of Analog-Only Beamforming

The main remark we shall make here is that analog-only beamforming does not offer a sum-rate advantage over conventional, small aperture MU-MIMO systems, except for the very special case of well-separated UEs with LOS. For that, recall that analog-only beamforming is the same as H-GOB with $N=1$ , but wherein baseband processing has been suppressed. In fact, analysis of the measured channels shows that the sum-rates of analog-only beamforming, and those of regular H-GOB (thus with baseband processing) differ by less than 1%, in all scenarios. The claim follows by direct inspection of Fig. 3, in Sec. V-B.

VI Conclusions

Using measured channels at 2.6 GHz, we have compared the performance of five techniques for DL beamforming in Massive MIMO, namely, fully-digital reciprocity-based (TDD) beamforming, and four flavors of FDD beamforming based on feedback of CSI (D-GOB, H-GOB, D-SUB, and H-SUB). The central result is that, while FDD beamforming with predetermined beams may achieve a hefty share of the DL sum-rate of TDD beamforming, performance depends critically on the existence of advantageous propagation conditions, namely, LOS with high Ricean factors. In other considered scenarios, the performance loss is significant for the non reciprocity-based beamforming solutions. Therefore, if robust operation across a wide variety of propagation conditions is required, reciprocity-based TDD beamforming is the only feasible alternative.

-A Efficient Algorithm for Approximate Solution of (14)

As noted in Sec. III-B, solving problem (14) exactly becomes computationally intractable for moderately large values of $M^{\prime}$ . Instead, we present an algorithmic solution based on the concept of greedy pursuit. The algorithm is summarized in Alg. 1. (Note that, for simplicity of notation, the indices $\ell$ and $k$ have been omitted.) In short, the procedure starts by obtaining (steps 3 and 4) the index $j^{\ast}$ such that $\boldsymbol{h}$ has the largest projection along $\boldsymbol{c}_{j^{\ast}}$ . It then stores $\boldsymbol{c}_{j^{\ast}}$ and $j^{\ast}$ in steps 5 and 6 to form $\boldsymbol{B}^{(1)}$ and $\mathcal{Q}^{(1)}$ , respectively. In the next iteration, a new beam $\boldsymbol{c}_{j^{\ast}}$ is selected such as to maximize the projection on the subspace spanned by the columns of $\begin{bmatrix}\boldsymbol{B}^{(1)}\mid\boldsymbol{c}_{j^{\ast}}\end{bmatrix}$ of $\boldsymbol{h}$ . (Note that the desired projection is given as the result of the multiplication ${\begin{bmatrix}\boldsymbol{B}^{(i-1)}\mid\boldsymbol{c}_{j}\end{bmatrix}}^{\dagger}{\begin{bmatrix}\boldsymbol{B}^{(i-1)}\mid\boldsymbol{c}_{j}\end{bmatrix}}^{\operatorname{H}}\boldsymbol{h}$ in step 4.) It then repeats steps 5 and 6. The algorithm continues until steps 3 to 6 have been executed exactly $N$ times, at which point $\boldsymbol{B}^{(N)}$ would contain the $N$ selected beams, and $\mathcal{Q}^{(N)}$ their indices. Computationally, Alg. 1 can be efficiently implemented by sequential Gram-Schmidt orthogonalization of the beamforming matrices $\boldsymbol{B}^{(1)},\ldots,\boldsymbol{B}^{(N)}$ .

-B The Sum-Capacity of the MIMO-BC with Beamforming

For ease of notation, we will drop the index $\ell$ . For given $\boldsymbol{B}$ and $\rho$ , $\mathcal{C}_{\text{BC}}(\boldsymbol{H}\boldsymbol{B},\rho)$ is the sum-rate of the MIMO broadcast channel (BC) $\boldsymbol{H}\boldsymbol{B}$ , and is given by the solution to [27, 26]:

[TABLE]

where $\boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}_{K}$ are covariance matrices. The objective function of (26) is nonconcave in $\boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}_{K}$ , and hence finding the maximum is a nontrivial problem. One would like to apply the BC-multiple access channel (MAC) duality theorem [26] so as to transform the nonconcave problem (26) into an equivalent, concave one, for which efficient solvers are known to exist [47]. However, the presence of $\boldsymbol{B}$ in the constraint $\sum_{i=1}^{K}\mathrm{tr}\left({{\boldsymbol{B}}\boldsymbol{Q}_{i}{\boldsymbol{B}}^{\operatorname{H}}}\right)\leq\rho$ prevents us from invoking the BC-MAC duality theorem. Fortunately, we have the following useful result.

Lemma 1

For given $\boldsymbol{B}=\boldsymbol{U}\boldsymbol{L}$ with ${\boldsymbol{U}}^{\operatorname{H}}\boldsymbol{U}=\boldsymbol{I}$ , and $\boldsymbol{L}$ an invertible matrix, and for given $\rho$ , we have that

[TABLE]

where $\mathcal{C}_{\text{MAC}}\left({\boldsymbol{U}}^{\operatorname{H}}{\boldsymbol{H}}^{\operatorname{H}},\rho\right)$ is the sum-capacity of the MIMO-MAC ${\boldsymbol{U}}^{\operatorname{H}}{\boldsymbol{H}}^{\operatorname{H}}$ [48].

Proof:

By inserting $\boldsymbol{B}=\boldsymbol{U}\boldsymbol{L}$ into equation (26), we obtain the optimization problem

[TABLE]

where we have used that $\mathrm{tr}\left({\boldsymbol{U}\boldsymbol{L}\boldsymbol{Q}_{i}{\boldsymbol{L}}^{\operatorname{H}}{\boldsymbol{U}}^{\operatorname{H}}}\right)=\mathrm{tr}\left({\boldsymbol{L}\boldsymbol{Q}_{i}{\boldsymbol{L}}^{\operatorname{H}}}\right)$ by the cyclic property of the trace operator and the fact that ${\boldsymbol{U}}^{\operatorname{H}}\boldsymbol{U}=\boldsymbol{I}$ , by assumption.

Define the effective covariance matrices $\tilde{\boldsymbol{Q}_{i}}=\boldsymbol{L}\boldsymbol{Q}_{i}{\boldsymbol{L}}^{\operatorname{H}}$ , $i=1,\ldots,K$ , and the effective channel $\tilde{\boldsymbol{H}}=\boldsymbol{H}\boldsymbol{U}$ . Using these definitions, and the fact that $\boldsymbol{L}$ is invertible, (-B) can be rewritten as

[TABLE]

Crucially, because $\boldsymbol{L}$ is invertible, $\tilde{\boldsymbol{Q}_{i}}=\boldsymbol{L}\boldsymbol{Q}_{i}{\boldsymbol{L}}^{\operatorname{H}}$ is an isomorphism. Thus, for every $\{\tilde{\boldsymbol{Q}}_{i}\}_{i=1}^{K}$ satisfying the constraints in (29) we can find $\{\boldsymbol{Q}_{i}\}_{i=1}^{K}$ fulfilling the constraints in (-B), and the converse is also true. We may now apply the BC-MAC duality theorem [26] to (29), from which the desired result follows. ∎

-C Efficient Algorithm for Approximate Solution of (16)

An algorithmic solution for beam selection in multiuser MIMO systems is presented in Alg. 2. For ease of notation, the index $\ell$ has been omitted. Alg. 2 is again based on the concept of greedy pursuit, and proceeds analogously to Alg. 1, although with a different objective function. In particular, the objective function in Alg. 2 needs to depend on the channel matrix $\boldsymbol{H}$ , rather than on a single channel vector $\boldsymbol{h}_{k}$ . Also, the selection of the beams depends now on the system SNR $\rho$ . Once the $N$ beams (that is, the columns of the beamformer $\boldsymbol{B}$ ) have been selected, the optimal covariance matrices $\boldsymbol{Q}_{1},\ldots,\boldsymbol{Q}_{K}$ may be comptuted by first solving (15), and then applying the MAC-to-BC transformation—see, e.g., [27, 26, 29]. The selection of the beams along with the computation of the MIMO-BC covariance matrices is done independently for each subcarrier.

Acknowledgment

The presented investigations are based on data obtained in measurement campaigns performed by Xiang Gao, Fredrik Tufvesson, Ove Edfors, Tommy Hult, and Meifang Zhu, as well as Sohail Payami, and Fredrik Tufvesson.

Bibliography48

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. L. Marzetta, “Noncooperative cellular wireless with unlimited number of base station antennas,” IEEE Trans. Wireless Commun. , vol. 9, no. 11, pp. 3590–3600, Nov. 2010.
2[2] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO . Cambridge: Cambridge University Press, 2016.
3[3] T. L. Marzetta, “How much training is required for multiuser MIMO?” in Proc. ASILOMAR 2006 - 40th Conf. on Sig., Syst. and Comput. (ACSSC) , Pacific Grove, CA, USA, Nov.–Dec. 2006, pp. 359–363.
4[4] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process. Mag. , vol. 30, no. 1, pp. 40–60, Jan. 2013.
5[5] X. Gao, O. Edfors, F. Rusek, and F. Tufvesson, “Massive MIMO performance evaluation based on measured propagation data,” IEEE Trans. Wireless Commun. , vol. 14, no. 7, pp. 3899–3911, 2015.
6[6] J. Flordelis, X. Gao, G. Dahman, F. Rusek, O. Edfors, and F. Tufvesson, “Spatial separation of closely-spaced users in measured massive multi-user MIMO channels,” in Proc. ICC 2015 - IEEE Int. Conf. Commun. , London, UK, Jun. 2015, pp. 1441–1446.
7[7] P. Harris, S. Malkowsky, J. Vieira, F. Tufvesson, W. B. Hasan, L. Liu, M. Beach, S. Armour, and O. Edfors, “Performance characterization of a real-time massive MIMO system with LOS mobile channels,” Accepted for publication in IEEE J. Sel. Areas Commun. , Mar. 2017.
8[8] H. Prabhu, J. Rodrigues, L. Liu, and O. Edfors, “A 60 p J/b 300 Mb/s 128 × \times 8 massive MIMO precoder-detector in 28 nm FD-SOI,” in Proc. ISSCC 2017 - Int. Solid-State Circuits Conf. , San Francisco, CA, Feb. 2017, pp. 171–176.

Campaign A
Campaign A	Scenario 1. $K = 4$ well-separated UEs in LOS, in which one UE from each of the sites MS 1 to MS 4 is selected. The minimum UE separation is 10 m.	Scenario 2. $K = 4$ co-located UEs in NLOS, in which four UEs are selected from site MS 5. The minimum UE separation is 0.5 m.	Scenario 3. $K = 4$ co-located UEs in LOS, in which four UEs are selected from site MS 2. The minimum UE separation is 0.5 m.
Campaign B
Campaign B	Scenario 4. $K = 4$ separated UEs in LOS and strong scattered components. We considered four sets of UEs (in different colors). The minimum UE separation is 3 m.	Scenario 5. $K = 8$ separated UEs in LOS and strong scattered components. We considered four sets of UEs (in different colors). The minimum UE separation is 3 m.	Scenario 6. $K = 16$ separated UEs in LOS and strong scattered components. The minimum UE separation is 3 m.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Massive MIMO Performance—TDD Versus FDD:

Abstract

Index Terms:

I Introduction

I-A Notation

II System Model

III Transmission Techniques

III-A Fully-Digital Reciprocity-Based (TDD) Beamforming

III-B Feedback-Based FDD Beamforming with Predetermined Beams

Digital Grid-of-Beams (D-GOB)

Digital Subspace Beamforming (D-SUB)

Hybrid Subspace Beamforming (H-SUB)

Hybrid Grid-of-Beams (H-GOB)

IV Measured Channels

V Results and Discusion

V-A Relative Sum-rate as a Function of NNN

V-B Required NNN for a Maximum SNR Loss

V-C Tradeoff of Antennas versus RF Chains

V-D The Impact of DL Training Overhead

Example 1

Example 2

V-E On the Performance of Analog-Only Beamforming

VI Conclusions

-A Efficient Algorithm for Approximate Solution of (14)

-B The Sum-Capacity of the MIMO-BC with Beamforming

Lemma 1

Proof:

-C Efficient Algorithm for Approximate Solution of (16)

Acknowledgment

V-A Relative Sum-rate as a Function of $N$

V-B Required $N$ for a Maximum SNR Loss