A Deep Learning Framework for Optimization of MISO Downlink Beamforming

Wenchao Xia; Gan Zheng; Yongxu Zhu; Jun Zhang; Jiangzhou Wang; and; Athina P. Petropulu

arXiv:1901.00354·cs.IT·January 15, 2020

A Deep Learning Framework for Optimization of MISO Downlink Beamforming

Wenchao Xia, Gan Zheng, Yongxu Zhu, Jun Zhang, Jiangzhou Wang, and, Athina P. Petropulu

PDF

TL;DR

This paper introduces a deep learning framework using convolutional neural networks to optimize downlink beamforming in MISO systems, significantly reducing computational delay while maintaining near-optimal performance.

Contribution

It develops three neural network models for key beamforming optimization problems, integrating expert knowledge and hybrid learning methods for improved efficiency.

Findings

01

Achieves near-optimal solutions for SINR balancing and power minimization.

02

Attains performance close to traditional algorithms for sum rate maximization.

03

Reduces computational complexity significantly compared to iterative methods.

Abstract

Beamforming is an effective means to improve the quality of the received signals in multiuser multiple-input-single-output (MISO) systems. Traditionally, finding the optimal beamforming solution relies on iterative algorithms, which introduces high computational delay and is thus not suitable for real-time implementation. In this paper, we propose a deep learning framework for the optimization of downlink beamforming. In particular, the solution is obtained based on convolutional neural networks and exploitation of expert knowledge, such as the uplink-downlink duality and the known structure of optimal solutions. Using this framework, we construct three beamforming neural networks (BNNs) for three typical optimization problems, i.e., the signal-to-interference-plus-noise ratio (SINR) balancing problem, the power minimization problem, and the sum rate maximization problem. For the former…

Tables2

Table 1. TABLE I: Parameters of the neural network modules.

Layer	Parameter
Layer 1 (input)	Input of size $2 \times N K$ , batch of size 200, 100 epochs
Layer 2 (convolutional)	8 kernels of $3 \times 3$ , zero padding 1, stride 1
Layer 3 (batch normalization)	Momentum=0.99, $ϵ = 0.001$
Layer 4 (activation)	ReLU
Layer 5 (convolutional)	8 kernels of $3 \times 3$ , zero padding 1, stride 1
Layer 7 (batch normalization)	Momentum=0.99, $ϵ = 0.001$
Layer 6 (activation)	ReLU
Layer 8 (flatten)
Layer 9 (fully-connected)	$K$ or $2 K$ neurons
Layer 10 (activation)	Sigmoid
Layer 11 output layer	Adam optimizer, learning rate of 0.001, MSE metric

Table 2. TABLE II: I/Q transformation versus P/M transformation.

K/N		4	6	8	10	12
I/Q transformation	MSE	0.084	0.038	0.022	0.014	0.010
I/Q transformation	MAE	0.223	0.147	0.111	0.088	0.075
P/M transformation	MSE	0.086	0.039	0.022	0.014	0.010
P/M transformation	MAE	0.225	0.149	0.111	0.087	0.073

Equations60

y_{k} = h_{k}^{H} k^{'} = 1 \sum K w_{k^{'}} x_{k^{'}} + n_{k},

y_{k} = h_{k}^{H} k^{'} = 1 \sum K w_{k^{'}} x_{k^{'}} + n_{k},

γ_{k}^{d l} = \frac{∣ h _{k}^{H} w _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} ∣ h _{k}^{H} w _{k^{'}} ∣ ^{2} + σ ^{2}} .

γ_{k}^{d l} = \frac{∣ h _{k}^{H} w _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} ∣ h _{k}^{H} w _{k^{'}} ∣ ^{2} + σ ^{2}} .

P1: W max 1 \leq k \leq K min \frac{γ _{k}^{d l}}{ρ _{k}}, s.t. k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2} \leq P_{ma x},

P1: W max 1 \leq k \leq K min \frac{γ _{k}^{d l}}{ρ _{k}}, s.t. k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2} \leq P_{ma x},

P2: W min k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2}, s.t. γ_{k}^{d l} \geq Γ_{k}, \forall k,

P2: W min k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2}, s.t. γ_{k}^{d l} \geq Γ_{k}, \forall k,

P3: W max k = 1 \sum K α_{k} lo g_{2} (1 + γ_{k}^{d l}), s.t. k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2} \leq P_{ma x},

P3: W max k = 1 \sum K α_{k} lo g_{2} (1 + γ_{k}^{d l}), s.t. k = 1 \sum K ∣∣ w_{k} ∣ ∣^{2} \leq P_{ma x},

O_{conv, l} = Conv (I_{conv, l}, Ξ_{l}, ξ_{l}), l \in L,

O_{conv, l} = Conv (I_{conv, l}, Ξ_{l}, ξ_{l}), l \in L,

Z_{bn, l, c} [i, j] = \frac{O _{conv, l, c} [ i , j ] - μ _{l, c}}{Var _{l, c} + ϵ _{l, c}}, l \in L, c = 1, \dots, c_{l}, i = 1, \dots, b_{l}^{(1)}, j = 1, \dots, b_{l}^{(2)}

Z_{bn, l, c} [i, j] = \frac{O _{conv, l, c} [ i , j ] - μ _{l, c}}{Var _{l, c} + ϵ _{l, c}}, l \in L, c = 1, \dots, c_{l}, i = 1, \dots, b_{l}^{(1)}, j = 1, \dots, b_{l}^{(2)}

ReLU (z) = max (0, z) and sigmoid (z) = \frac{1}{1 + e ^{- z}},

ReLU (z) = max (0, z) and sigmoid (z) = \frac{1}{1 + e ^{- z}},

o_{fc} = Π i_{fc} + π,

o_{fc} = Π i_{fc} + π,

MAE = \frac{1}{F K} f = 1 \sum F ∣∣ q^{(f)} - \hat{q}^{(f)} ∣ ∣_{1},

MAE = \frac{1}{F K} f = 1 \sum F ∣∣ q^{(f)} - \hat{q}^{(f)} ∣ ∣_{1},

MSE = \frac{1}{F K} f = 1 \sum F ∣∣ q^{(f)} - \hat{q}^{(f)} ∣ ∣_{2}^{2},

MSE = \frac{1}{F K} f = 1 \sum F ∣∣ q^{(f)} - \hat{q}^{(f)} ∣ ∣_{2}^{2},

C^{d l} (\tilde{W}, P_{ma x}) = C^{u l} (\tilde{W}, P_{ma x}),

C^{d l} (\tilde{W}, P_{ma x}) = C^{u l} (\tilde{W}, P_{ma x}),

C^{d l} (\tilde{W}, P_{ma x}) =

C^{d l} (\tilde{W}, P_{ma x}) =

∣∣ p ∣ ∣_{1} \leq P_{ma x},

∣∣ \tilde{w}_{k} ∣ ∣_{2} = 1, \forall k,

C^{u l} (\tilde{W}, P_{ma x}) =

C^{u l} (\tilde{W}, P_{ma x}) =

∣∣ q ∣ ∣_{1} \leq P_{ma x},

∣∣ \tilde{w}_{k} ∣ ∣_{2} = 1, \forall k,

γ_{k}^{d l} (\tilde{W}, p) = \frac{p _{k} ∣ h _{k}^{H} w ~ _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} p _{k^{'}} ∣ h _{k}^{H} w ~ _{k^{'}} ∣ ^{2} + σ ^{2}},

γ_{k}^{d l} (\tilde{W}, p) = \frac{p _{k} ∣ h _{k}^{H} w ~ _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} p _{k^{'}} ∣ h _{k}^{H} w ~ _{k^{'}} ∣ ^{2} + σ ^{2}},

γ_{k}^{u l} (\tilde{W}, q) = \frac{q _{k} ∣ h _{k}^{H} w ~ _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} q _{k^{'}} ∣ h _{k'}^{H} w ~ _{k} ∣ ^{2} + σ ^{2}} .

γ_{k}^{u l} (\tilde{W}, q) = \frac{q _{k} ∣ h _{k}^{H} w ~ _{k} ∣ ^{2}}{\sum _{k^{'} = 1, k^{'} \neq = k}^{K} q _{k^{'}} ∣ h _{k'}^{H} w ~ _{k} ∣ ^{2} + σ ^{2}} .

\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})=\left[\begin{array}[]{cc}\mathbf{D}\mathbf{U}&\mathbf{D}\bm{\sigma}\\ \frac{1}{P_{max}}\mathbf{1}^{T}\mathbf{D}\mathbf{U}&\frac{1}{P_{max}}\mathbf{1}^{T}\mathbf{D}\bm{\sigma}\\ \end{array}\right],

\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})=\left[\begin{array}[]{cc}\mathbf{D}\mathbf{U}&\mathbf{D}\bm{\sigma}\\ \frac{1}{P_{max}}\mathbf{1}^{T}\mathbf{D}\mathbf{U}&\frac{1}{P_{max}}\mathbf{1}^{T}\mathbf{D}\bm{\sigma}\\ \end{array}\right],

[U]_{k k^{'}} = {∣ (\tilde{w}_{k^{'}}^{*})^{H} h_{k} ∣^{2}, 0, if k^{'} \neq = k, else .

[U]_{k k^{'}} = {∣ (\tilde{w}_{k^{'}}^{*})^{H} h_{k} ∣^{2}, 0, if k^{'} \neq = k, else .

\hat{q}^{*} = \frac{P _{ma x}}{∣∣ q ^ ∣ ∣ _{1}} \hat{q} .

\hat{q}^{*} = \frac{P _{ma x}}{∣∣ q ^ ∣ ∣ _{1}} \hat{q} .

s.t. q, \tilde{W} min k = 1 \sum K q_{k} γ_{k}^{u l} (\tilde{W}, q) \geq Γ_{k}, ∣∣ \tilde{w}_{k} ∣ ∣_{2} = 1, \forall k,

s.t. q, \tilde{W} min k = 1 \sum K q_{k} γ_{k}^{u l} (\tilde{W}, q) \geq Γ_{k}, ∣∣ \tilde{w}_{k} ∣ ∣_{2} = 1, \forall k,

p^{*} = σ^{2} Ψ^{- 1} 1,

p^{*} = σ^{2} Ψ^{- 1} 1,

[Ψ]_{k k^{'}} = {\frac{1}{Γ _{k}} ∣ h_{k}^{H} \tilde{w}_{k}^{*} ∣^{2}, if k = k^{'}, - ∣ h_{k}^{H} \tilde{w}_{k^{'}}^{*} ∣^{2}, else .

[Ψ]_{k k^{'}} = {\frac{1}{Γ _{k}} ∣ h_{k}^{H} \tilde{w}_{k}^{*} ∣^{2}, if k = k^{'}, - ∣ h_{k}^{H} \tilde{w}_{k^{'}}^{*} ∣^{2}, else .

\tilde{w}_{k} = \frac{T ^{- 1} h _{k}}{∣∣ T ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k,

\tilde{w}_{k} = \frac{T ^{- 1} h _{k}}{∣∣ T ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k,

w_{k}^{*} = p_{k} \frac{( I _{N} + \sum _{k = 1}^{K} \frac{λ _{k}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k}}{∣∣ ( I _{N} + \sum _{k = 1}^{K} \frac{λ _{k}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k,

w_{k}^{*} = p_{k} \frac{( I _{N} + \sum _{k = 1}^{K} \frac{λ _{k}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k}}{∣∣ ( I _{N} + \sum _{k = 1}^{K} \frac{λ _{k}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k,

Loss = \frac{1}{2 L K} l = 1 \sum L (∣∣ \underline{p}^{(l)} - \hat{p}^{(l)} ∣ ∣_{2}^{2} + ∣∣ \underline{λ}^{(l)} - \hat{λ}^{(l)} ∣ ∣_{2}^{2}),

Loss = \frac{1}{2 L K} l = 1 \sum L (∣∣ \underline{p}^{(l)} - \hat{p}^{(l)} ∣ ∣_{2}^{2} + ∣∣ \underline{λ}^{(l)} - \hat{λ}^{(l)} ∣ ∣_{2}^{2}),

Loss = - \frac{1}{2 K L} l = 1 \sum L k = 1 \sum K α_{k}^{(l)} lo g_{2} (1 + γ_{k}^{u l, (l)}) .

Loss = - \frac{1}{2 K L} l = 1 \sum L k = 1 \sum K α_{k}^{(l)} lo g_{2} (1 + γ_{k}^{u l, (l)}) .

\hat{p}^{*} = \frac{P _{ma x}}{∣∣ p ^ ∣ ∣ _{1}} \hat{p} and \hat{λ}^{*} = \frac{P _{ma x}}{∣∣ λ ^ ∣ ∣ _{1}} \hat{λ} .

\hat{p}^{*} = \frac{P _{ma x}}{∣∣ p ^ ∣ ∣ _{1}} \hat{p} and \hat{λ}^{*} = \frac{P _{ma x}}{∣∣ λ ^ ∣ ∣ _{1}} \hat{λ} .

\hat{w}_{k}^{*} = \overset{p}{^}_{k}^{*} \frac{( I _{N} + \sum _{k = 1}^{K} \frac{λ ^ _{k}^{*}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k}}{∣∣ ( I _{N} + \sum _{k = 1}^{K} \frac{λ ^ _{k}^{*}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k .

\hat{w}_{k}^{*} = \overset{p}{^}_{k}^{*} \frac{( I _{N} + \sum _{k = 1}^{K} \frac{λ ^ _{k}^{*}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k}}{∣∣ ( I _{N} + \sum _{k = 1}^{K} \frac{λ ^ _{k}^{*}}{σ ^{2}} h _{k} h _{k}^{H} ) ^{- 1} h _{k} ∣ ∣ _{2}}, \forall k .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Deep Learning Framework for Optimization of MISO Downlink Beamforming

Wenchao Xia, Gan Zheng, , Yongxu Zhu, Jun Zhang, Jiangzhou Wang, , and Athina P. Petropulu W. Xia and J. Zhang are with the Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (e-mail: [email protected], [email protected]).G. Zheng and Y. Zhu are with the Wolfson School of Mechanical, Electrical and Manufacturing Engineering, Loughborough University, Leicestershire, LE11 3TU, UK (e-mail: [email protected], [email protected]).J. Wang is with the School of Engineering and Digital Arts at the University of Kent, Kent, CT2 7NT, UK (e-mail: [email protected]). A. P. Petropulu is with the Department of Electrical & Computer Engineering Rutgers, The State University of New Jersey, Piscataway, NJ 08854 (e-mail: [email protected]).

Abstract

Beamforming is an effective means to improve the quality of the received signals in multiuser multiple-input-single-output (MISO) systems. Traditionally, finding the optimal beamforming solution relies on iterative algorithms, which introduces high computational delay and is thus not suitable for real-time implementation. In this paper, we propose a deep learning framework for the optimization of downlink beamforming. In particular, the solution is obtained based on convolutional neural networks and exploitation of expert knowledge, such as the uplink-downlink duality and the known structure of optimal solutions. Using this framework, we construct three beamforming neural networks (BNNs) for three typical optimization problems, i.e., the signal-to-interference-plus-noise ratio (SINR) balancing problem, the power minimization problem, and the sum rate maximization problem. For the former two problems the BNNs adopt the supervised learning approach, while for the sum rate maximization problem a hybrid method of supervised and unsupervised learning is employed. Simulation results show that the BNNs can achieve near-optimal solutions to the SINR balancing and power minimization problems, and a performance close to that of the weighted minimum mean squared error algorithm for the sum rate maximization problem, while in all cases enjoy significantly reduced computational complexity. In summary, this work paves the way for fast realization of optimal beamforming in multiuser MISO systems.

Index Terms:

Deep learning, beamforming, MISO, beamforming neural network.

I Introduction

Downlink beamforming techniques have attracted much attention in the past decades for their ability to realize the performance gain of multiple antennas. Beamforming has been formulated in various ways, i.e., as a signal-to-interference-plus-noise ratio (SINR) balancing problem (also known as interference balancing problem) under a total power constraint [2, 3, 4], as a power minimization problem under quality of service (QoS) constraints [5, 6, 7, 8], or as a sum rate maximization problem under a total power constraint [2, 9, 10, 11]. Existing approaches to finding the optimal beamforming solutions heavily rely on tailor-made iterative algorithms and convex optimization, which is in turn solved by general iterative algorithms such as the interior point method. For instance, the SINR balancing problem can be solved by the iterative algorithm of [12]. The power minimization problem can be reformulated as a second-order cone programming (SOCP) [8, 7] or semidefinite programming (SDP) problem [13, 14], which can be solved directly by an optimization software package such as CVX [15]. Its optimal solution can also be obtained using iterative algorithms such as Algorithm A of [16] and the dual algorithm of [5, 12]. However, the optimal solution to the sum rate maximization problem is usually hard to obtain because the problem is nonconvex. Locally optimal solutions are obtained via iterative algorithms, such as the weighted minimum mean squared error (WMMSE) algorithm [9, 10], and asymptotically optimal solutions are obtained using the water filling algorithm combined with zero-forcing (ZF) beamforming [11].

The main drawbacks of existing iterative algorithms are the high computational complexity and the resulting latency. As a result, the beamforming technique is unable to meet the demands of real-time applications in the fifth-generation (5G) system and beyond, such as autonomous vehicles and mission critical communications. Even in non-real-time applications, where the small-scale fading varies in the order of milliseconds, the latency introduced by the iterative process renders the beamforming solution outdated. To address this challenge, researchers have proposed simple heuristic beamforming solutions which admit closed-form solutions, such as the maximum-ratio transmission beamforming, the ZF beamforming, and the regularized ZF (RZF) beamforming. These heuristic beamforming solutions are directly computed based on the channel state information (CSI) without iteration, and thus involve low computational delay. However, the reduction of delay is achieved at the cost of performance loss. The tradeoff between delay and performance seems to restrict the potential of the beamforming techniques and its applications in practice.

Thanks to the recent advances in deep learning (DL) techniques, it becomes possible to find the optimal beamforming in real time by taking into account both performance and computational delay simultaneously. This is because the DL technique trains neural networks offline and then deploys the trained neural networks for online optimization. The computational complexity is transferred from the online optimization to the offline training, and only simple linear and nonlinear operations are needed when the trained neural network is used to find the optimal beamforming solution, thus greatly reducing the computational complexity and delay.

Benefiting from the development of specialized hardware, such as graphic processing units and field programmable gate arrays, DL can be implemented using these hardware resources conveniently. Accordingly, DL techniques have been widely used in many applications including wireless communications. A lot of research has attempted to use DL to address physical layer issues, including channel decoding [17, 18], detection [19, 20, 21], channel estimation [22, 23, 24], and resource management [25, 26, 27, 28, 29, 30, 31, 32]. Among these efforts, the autoencoder based on unsupervised DL, investigated in [33, 34], is an ambitious attempt to learn an end-to-end communications system [35]. DL can also facilitate resource management [25, 26], including power allocation [27, 28, 29, 30, 31]. Finally, [36, 37] provide an overview on the recent advances in DL-based physical layer communications and [38] suggests potential applications of DL to the physical layer.

However, with the exception of [39, 40, 41, 42], there are no works focusing on beamforming design in multi-antenna communications based on DL. A common method used in the related literature is codebook-based beam selection. For example, [39] designed a decentralized robust precoding scheme based on DNN in a network MIMO configuration. However, while the projection over a finite dimensional subspace reduces the difficulty, it also results in performance loss. [40] used a DL model to predict the beamforming matrix directly from the signals received at distributed BSs in millimeter wave systems. The sum rate performance in [40] was restricted by the quantized codebook constraint. Different from [39, 40] which predicted the beamforming matrix in the finite solution space, [41, 42] directly estimated the beamforming matrix; in that case the number of variables to predict increases significantly as the numbers of transmit antennas and users increase, leading to high training complexity of the neural networks. Furthermore, we note that none of the aforementioned works addressed the SINR balancing problem under a total power constraint, or the power minimization problem under SINR constraints.

Motivated by the above facts and the universal approximation theorem [43, 44], we propose a general DL framework to achieve not only near-optimal beamforming matrix, but also reduce complexity and latency as compared to the iterative methods. Based on the proposed framework, we develop beamforming neural networks (BNNs) to solve the three aforementioned optimization problems. Learning the optimal beamforming solution is highly nontrivial, and there are still challenges that need to be overcome in designing the BNNs. Firstly, the popular neural network software packages such as Keras and Tensorflow currently (March 2019) do not support complex numbers as input or output [35]. However, both channel and beamforming vectors are inherently complex. Naively using a black-box DL model to predict beamforming vectors based on CSI matrices (with a suitable real-valued representation) will not only lead to high complexity of prediction, but also lose the specific structures of the problems of interest. Secondly, the power minimization problem has strict QoS constraints and guaranteeing a feasible solution using neural networks is a challenge. In addition, different from the SINR balancing and power minimization problems, there is no practically useful algorithm that can achieve the optimal solution to the sum rate maximization problem (and other nonconvex beamforming problems), and thus the supervised learning method based on locally optimal solution cannot achieve good performance. In this paper, we will tackle these challenges, and our main contributions are summarized as follows:

•

We provide a DL-based framework for the beamforming optimization in the multiple-input-single-output (MISO) downlink, where the BS has multiple antennas while each user terminal has a single antenna. The proposed framework is designed based on the CNN structure. Different from existing works where the CNN was applied to power control [29, 30], resource allocation [45], and wireless scheduling [46], the proposed framework combines a signal processing module with the neural network module by exploiting expert knowledge such as the uplink-downlink duality and the known structure of the optimal solutions, so as to improve learning efficiency by specifying the best parameters to be learned; those parameters are typically not the direct beamforming matrix. This framework can deal with three types of beamforming optimization problems: 1) problems whose optimal solutions are easy to find and the constraints are easy to meet; 2) problems whose optimal solutions are easy to find but the constraints are hard to meet; and 3) problems which have no practically useful algorithm that can achieve optimal solutions efficiently. Under this framework, we propose three BNNs for solving three typical optimization problems in MISO systems, i.e., the SINR balancing problem under a total power constraint, the power minimization problem under QoS constraints, and the sum rate maximization problem under a total power constraint.

•

In the proposed supervised BNNs for the SINR balancing and power minimization problems, instead of estimating the beamforming matrix with $NK$ elements, where $N$ is the number of the transmit antennas at the BS and $K$ is the number of users, we exploit the uplink-downlink duality of solutions [6, 12, 5] and predict the virtual uplink power allocation vector with only $K$ elements. Thus, the demand on the prediction capability of the BNNs in terms of network neurons and layers is significantly reduced. Also, the training and prediction complexity and cost are reduced. In the proposed BNN for the sum rate maximization problem, we exploit the known structure of the optimal solutions and predict two power allocation vectors with a total of $2K$ elements. This approach still has advantages as compared to predicting the beamforming matrix directly.

•

We propose a hybrid two-stage BNN with both supervised and unsupervised learning to find the beamforming solution to the sum rate maximization problem [29], since no practically useful algorithm can find the global optimum. In the first stage, we use the supervised learning method with a mean squared error (MSE)-based loss function to make the predictions as close as possible to the WMMSE algorithm, which is known to achieve the locally optimal solution. In the second stage, we modify the metric in the loss function to be the sum rate, and update the network parameters according to the unsupervised learning method, which achieves a performance close to that of the WMMSE algorithm.

The remainder of this paper is organized as follows. Section II introduces the system model and formulates three beamforming optimization problems in the MISO downlink. Section III provides the framework for the beamforming optimization and then Sections IV, V and VI propose the BNNs under the framework for the SINR balancing problem, the power minimization problem, and the sum rate maximization problem, respectively. Numerical results are presented in Section VII. Finally, conclusion is drawn in Section VIII.

Notations: The notations are given as follows. Matrices and vectors are denoted by bold capital and lowercase symbols, respectively. $(\mathbf{A})^{T}$ and $(\mathbf{A})^{H}$ stand for transpose and conjugate transpose of $\mathbf{A}$ , respectively. The notations $||\bullet||_{1}$ and $||\bullet||_{2}$ are $l_{1}$ and $l_{2}$ norm operators, respectively. The operator $\text{diag}(\mathbf{a})$ denotes the operation to diagonalize the vector $\mathbf{a}$ into a matrix whose main diagonal elements are from $\mathbf{a}$ . Finally, $\mathbf{a}\sim\mathcal{CN}(\mathbf{0},\bm{\Sigma})$ represents a complex Gaussian vector with zero-mean and covariance matrix $\bm{\Sigma}$ .

II System Model

We consider a downlink transmission scenario where a BS equipped with $N$ antennas serves $K$ single-antenna users. The channel between user $k$ and the BS is denoted as ${\bf h}_{k}\in\mathbb{C}^{N\times 1}$ . The received signal at user $k$ is given by

[TABLE]

where $\mathbf{w}_{k}$ represents the beamforming vector for user $k$ , $x_{k}\sim\mathcal{CN}(0,1)$ is the transmitted symbol from the BS to user $k$ , and $n_{k}\sim\mathcal{CN}(0,\sigma^{2})$ denotes the additive Gaussian white noise (AWGN) with zero mean and variance $\sigma^{2}$ . The received SINR of user $k$ equals

[TABLE]

One conventional optimization problem seeks to maximize $\text{min}_{k}\gamma^{dl}_{k}/\rho_{k}$ subject to a transmit power constraint, where $\rho_{k}$ ’s are constant weights denoting the importance of the sub-streams. Such an optimization problem is referred to as interference or SINR balancing, and has been investigated in many works [2, 3, 4]. The SINR balancing problem is formulated as:

[TABLE]

where $\mathbf{W}=[\mathbf{w}_{1},\mathbf{w}_{2},\ldots,\mathbf{w}_{K}]$ is a set of beamforming vectors and $P_{max}$ is the power budget.

Another important problem is the power minimization problem under a set of SINR constraints [6, 7]. A network operator may be more interested in how to minimize the transmit power while fulfilling the demands for QoS, i.e.,

[TABLE]

where $\Gamma_{k}$ is the SINR constraint of user $k$ . For ease of reference, we define $\bm{\Gamma}=[\Gamma_{1},\cdots,\Gamma_{K}]^{T}$ as the SINR constraint vector.

Finally, the weighted sum rate maximization problem under a total power constraint has also attracted a lot of attention [2, 10, 9]. It can be formulated as:

[TABLE]

where $\alpha_{k}$ is a constant weight of user $k$ .

We choose the above problems as representative examples to demonstrate the effectiveness of our proposed DL beamforming framework. Practical algorithms to find optimal solutions are available for P1 [12, 47, 8] and P2 [5, 8, 7, 12, 13], thus supervised learning can be adopted for those problems. In this work, for simplicity, we assume that the optimal solution to problem P2 always exists and do not consider the infeasibility of QoS constraints. Under this assumption, P2 still has the additional challenge of satisfying strict QoS constraints. P3 is a difficult nonconvex problem and is usually solved using the iterative WMMSE approach [10, 9], therefore, supervised learning alone is insufficient for this case. In the rest of the paper, we will show how the solutions to these three types of problems can be efficiently learned by the proposed DL-based beamforming framework.

III A DL-based Framework for Beamforming Optimization

DL-based neural networks were initially designed for solving classification problems, but they can also achieve satisfactory performance in regression problems. For example, the DNN was used to predict transmit power [28, 27]. Existing works mainly take real data, such as channel gains and transmit power, as input and output, but channel and beamforming matrices are both complex. In addition, predicting the beamforming matrix with $NK$ elements directly may lead to inaccurate results. While we could use wider or deeper neural networks with more neurons to improve the learning ability, such huge networks would lead to high training and implementation complexity and their learning performance could not be guaranteed. For example, too deep or wide neural networks can cause over-fitting.

The proposed DL-based framework for the beamforming optimization in MISO downlink is shown in Fig. 1. We choose the CNN architecture as the base of the framework, because the CNN has strong ability of extracting features as well as approximation ability [43, 44]. In addition, the CNN can reduce the number of learned parameters by sharing weights and biases [30]. The proposed framework, instead of estimating the beamforming matrix directly, only predicts key features extracted from the beamforming matrix according to expert knowledge specific to the problem under consideration. Therefore, the demand for the prediction capability in terms of network neurons and layers, as well as its complexity, is significantly reduced.

III-A Structure of the Proposed Framework

As illustrated in Fig. 1, the proposed framework is a gray-box approach that takes advantages of both the conventional signal processing and the neural network approach. The proposed framework includes two main modules: the neural network module and the beamforming recovery module. The neural network module is composed of an input layer, convolutional layers, batch normalization layers, activation layers, a flatten layer, a fully-connected layer, and an output layer, whereas key features and the functional layers in the beamforming recovery module are specified by the expert knowledge. For ease of clarification, we assume that, besides the input, output, flatten, and fully-connected layers, there are $L=|\mathcal{L}|$ groups of functional layers in the neural network module and each group includes a convolutional layer, a batch normalization layer, and an activation layer. Below we give a brief introduction to these layers.

III-A1 Input Layer

The complex channel coefficients are fed into the neural network module to predict the key features, which are not supported by the current neural network software. To deal with this issue, two data transformations are available. One is to separate the complex channel vector, for example ${\bf h}=[{\bf h}_{1}^{T},\cdots,{\bf h}_{K}^{T}]^{T}\in\mathbb{C}^{NK\times 1}$ , into in-phase component $\mathfrak{R}({\bf h})$ and quadrature component $\mathfrak{I}({\bf h})$ , where $\mathfrak{R}({\bf h})$ and $\mathfrak{I}({\bf h})$ contain the real and imaginary parts of each element in ${\bf h}$ , respectively. We call this transformation I/Q transformation. Another transformation, suggested by [48], is to map the complex channel vector ${\bf h}$ into two real vectors $\mathfrak{P}({\bf h}_{k})$ and $\mathfrak{M}({\bf h}_{k})$ , where the former contains the phase information and the latter includes the magnitude information of ${\bf h}$ . This transformation is referred to as P/M transformation. As far as we know, there is no evidence to show which transformation is better. In this work, we adopt I/Q transformation of complex channels and formulate the input of the first convolutional layer as $[\mathfrak{R}({\bf h}),\mathfrak{I}({\bf h})]^{T}\in\mathbb{R}^{2\times NK}$ . Note that the samples are fed into the neural network module in batches during the training process.

III-A2 Convolutional Layer

Each convolutional layer $l\in\mathcal{L}$ creates $c_{l}$ convolution kernels of size $a_{l}\times a_{l}$ that are convolved with the layer input $\bm{I}_{\text{conv},l}\in\mathbb{R}^{b^{(1)}_{l-1}\times b^{(2)}_{l-1}\times c_{l-1}}$ , where $b^{(1)}_{l-1}$ and $b^{(2)}_{l-1}$ are the height and width of the output of the convolutional layer $l-1$ , respectively. Note that $c_{0}=1$ $b^{(1)}_{0}=2$ , and $b^{(2)}_{0}=NK$ . The parameters of the convolution kernels, including the weights $\bm{\Xi}_{l}\in\mathbb{R}^{a_{l}\times a_{l}\times c_{l}}$ and a bias vector $\bm{\xi}_{l}\in\mathbb{R}^{c_{l}\times 1}$ , are shared among different elements in $\bm{I}_{\text{conv},l}$ to extract features. More specifically, the output $\bm{O}_{\text{conv},l}\in\mathbb{R}^{b^{(1)}_{l}\times b^{(2)}_{l}\times c_{l}}$ of the convolutional layer $l$ is

[TABLE]

where the operator $\textsf{Conv}(\cdot,\cdot,\cdot)$ denotes the convolution operation.

III-A3 Batch Normalization Layer

The batch normalization layers are introduced in the neural network module, which can be put before or after the activation layers [49] according to practical experience. In the proposed framework, we adopt the former where the batch normalization layers normalize the output of the convolutional layers through subtracting the batch mean and dividing by the batch standard deviation, i.e.,

[TABLE]

where $\bm{X}[i,j]$ denotes $(i,j)$ -th element of matrix $\bm{X}$ , $\bm{O}_{\text{conv},l,c}\in\mathbb{R}^{b^{(1)}_{l}\times b^{(2)}_{l}}$ is the $c$ -th slice of $\bm{O}_{\text{conv},l}$ , $\mu_{l,c}=\frac{\sum^{F}_{f=1}\sum^{b^{(1)}_{l}}_{i=1}\sum^{b^{(2)}_{l}}_{j=1}\bm{O}^{(f)}_{\text{conv},l,c}[i,j]}{Fb^{(1)}_{l}b^{(2)}_{l}}$ and $\text{Var}_{l,c}=\frac{\sum^{F}_{f=1}\sum^{b^{(1)}_{l}}_{i=1}\sum^{b^{(2)}_{l}}_{j=1}\big{(}\bm{O}^{(f)}_{\text{conv},l,c}[i,j]-\mu_{l,c}\big{)}^{2}}{Fb^{(1)}_{l}b^{(2)}_{l}}$ are the batch mean and variance of the $c$ -th slice, respectively, $\epsilon_{l,c}$ is a small float added to the variance to avoid dividing by zero, and $F$ is the batch size. Note that such a simple normalization process may change what the layer can represent. To address this issue, two trainable parameters $\theta_{l,c}$ and $\beta_{l,c}$ are introduced to scale and shift the normalized value $\bm{Z}_{\text{bn},l,c}[i,j]$ as $\hat{\bm{Z}}_{\text{bn},l,c}[i,j]=\beta_{l,c}\bm{Z}_{\text{bn},l,c}[i,j]+\theta_{l,c}$ . This “denormalization” process is allowed by changing only these two parameters, instead of changing all parameters which may lead to the instability of the neural network module. Besides, the work in [49] claimed that the batch normalization layer can reduce the probability of over-fitting, enable a higher learning rate, and make the neural network less sensitive to the initialization of weights. Note that the batch normalization layers are element-wise functions, such that they do not change their respective input shapes.

III-A4 Activation Layer

Since the predicted variables are continuous and positive real numbers, it is suggested that the activation functions that can generate negative values, such as tanh and linear functions, should not be used in the last activation layer. The rectified linear unit (ReLU) and sigmoid functions are good choices for the last activation layer, which are given as

[TABLE]

respectively. The most common choice for the intermediate activation layers is the ReLU function. Note that the functions performed in the activation layers are element-wise functions, such that their outputs have the same shapes of their inputs, respectively.

III-A5 Flatten Layer, Fully-connected Layer, and Output Layer

The flatten layer is only used to change the shape of its input into a vector, for the fully-connected layer to interpret. The output $\mathbf{o}_{\text{fc}}\in\mathbb{R}^{m\times 1}$ of the fully-connected layer is

[TABLE]

where $\bm{i}_{\text{fc}}\in\mathbb{R}^{2NKc_{L}\times 1}$ is the input vector, $\bm{\Pi}\in\mathbb{R}^{m\times 2NKc_{L}}$ and $\bm{\pi}\in\mathbb{R}^{m\times 1}$ account for the weight matrix and bias vector, respectively, and $m$ is the number of the neurons in the fully-connected layer. The main function of the output layer is to generate the predicted results after the neural network finishes training.

Note that apart from these functional layers, the loss function also plays an important role in the proposed framework, which is marked on the output layer in Fig. 1. The loss function together with the learning rate guides the learning process of the neural network. In other words, the loss function “tells” the neural network how to update its parameters. Since the output values are continuous, it is suggested to utilize the mean absolute error (MAE) or the MSE as a metric. Given the predicted results of the $f$ -th sample in the neural network module is $\hat{{\bf q}}^{(f)}$ and the target result is ${\bf q}^{(f)}$ , the MAE and MSE are defined as

[TABLE]

and

[TABLE]

respectively. Generally speaking, the MAE function is more robust and is not affected by outliers. On the contrary, the MSE loss function is highly sensitive to outliers in the dataset because the MSE function tries to adjust the model according to these outlier values, at the expense of other samples [50]. In this work, the training dataset is generated by simulations and outliers are not an issue. Then we choose the MSE as the loss metric because its gradient is easier to calculate than that of the MAE.

III-A6 Beamforming Recovery Module

The beamforming recovery module is an important component whose aim is to recover the beamforming matrix from the predicted key features at the output layer. The functional layers in the beamforming recovery module are designed according to the expert knowledge of the beamforming optimization which maps/converts the key features to the beamforming matrix. The expert knowledge is problem-dependent and has no unified form, but what is in common is that the expert knowledge can significantly reduce the number of variables to be predicted compared to the beamforming matrix. For example, the uplink-downlink duality and specific solution structures are the typical expert knowledge for beamforming optimization.

The key features should be chosen carefully to meet some constraints required by applying the universal approximation theorem [43, 27], so that a feedforward network exists which can approximate the continuous mapping from the channel coefficients to the key features. More specifically, assume that $\bm{\tau}$ is a vector containing the chosen key features, the mapping function $f(\bullet)$ from ${\bf h}$ to $\bm{\tau}$ , i.e., $\bm{\tau}=f({\bf h})$ , should be a real-valued continuous function over a compact set. The compact set requirement holds whenever the possible values of the input ${\bf h}$ are bounded. However, the continuity of the mapping function depends on the choice of the key features.

In next three sections we will propose three BNNs under the proposed framework for problems P1, P2, and P3, respectively, and provide implementation details to show how to make use of the expert knowledge and choose the key features.

III-B Computational Complexity

The computational complexity of the proposed framework involves two main tasks: the online prediction and the offline training. To the best of our knowledge, complexity analysis of the offline training is still an open issue mainly because of the complex implementation of the backpropagation process. However, since the training is performed offline, and updated at a much longer time-scale compared to the online prediction, we assume its complexity can be afforded [51]. Thus, we focus on the complexity of the online prediction. In addition, the functional layers are problem-dependent in the beamforming recovery module, so only the complexity of the neural network module is analyzed below.

Big-O notation is a common method to describe the complexity of an algorithm. Given there are $c_{l}$ kernels of size $a_{l}\times a_{l}$ in the $l$ -th convolutional layer, then the numbers of multiplication and addition operations of convolutional layer $l$ are the same and equal to $a^{2}_{l}b^{(1)}_{l}b^{(2)}_{l}c_{l-1}c_{l}$ . Thus, the total time complexity of all convolutional layers measured by the number of multiplications is $\mathcal{O}\left(\sum_{l\in\mathcal{L}}a^{2}_{l}b^{(1)}_{l}b^{(2)}_{l}c_{l-1}c_{l}\right)$ [52]. It is known that the batch normalization layers and activation layers are element-wise functions, thus the computational complexity of total batch normalization layers and total activation layers in $L$ groups is $\mathcal{O}\left(\sum_{l\in\mathcal{L}}b^{(1)}_{l}b^{(2)}_{l}c_{l}\right)$ . The numbers of multiplication and addition operations of the fully-connected layer are also the same and equal to $b^{(1)}_{L}b^{(2)}_{L}c_{L}m$ , respectively. Then the time complexity of the fully-connected layer is given as $\mathcal{O}\left(b^{(1)}_{L}b^{(2)}_{L}c_{L}m\right)$ . Besides, the complexity of the input, output, and flatten layers are ignored due to the simplicity of their functions. If all convolutional layers use the kernels of size $3\times 3$ and apply stride 1 and zero padding 1, then $b^{(1)}_{l}=2$ and $b^{(2)}_{l}=NK,\forall l\in\mathcal{L}$ . Based on the above analysis and assuming the parameters of the neural network module are fixed, predicting the output of the neural network module needs $2NK\sum_{l\in\mathcal{L}}(9c_{l}c_{l-1}+c_{l})+2NKc_{L}m+2m$ arithmetic operations including multiplications, divisions, and exponentiations, and has an approximate complexity $\mathcal{O}\left(NK\right)$ .

IV BNN for SINR Balancing Problem

As mentioned above, estimating the beamforming matrix directly leads to the higher complexity of prediction due to the large amount of variables. In order to reduce the prediction complexity, we introduce a scheme which first predicts the power allocation vector as the key feature and then achieves the corresponding beamforming matrix based on the predicted results. Such a scheme is based on the expert knowledge named the uplink-downlink duality.

IV-A Uplink-Downlink Duality

Before we present the BNN for the SINR balancing problem P1, we first introduce the following lemma to describe the uplink-downlink duality of problem P1 [12].

Lemma 1.

Given $\tilde{{\bf W}}=[\tilde{{\bf w}}_{1},\tilde{{\bf w}}_{2},\ldots,\tilde{{\bf w}}_{K}]$ and $P_{max}$ , we have

[TABLE]

where $C^{dl}(\tilde{{\bf W}},P_{max})$ and $C^{ul}(\tilde{{\bf W}},P_{max})$ are given as

[TABLE]

and

[TABLE]

respectively, with

[TABLE]

and

[TABLE]

Note that ${\bf p}=[p_{1},\ldots,p_{K}]^{T}$ and $\mathbf{q}=[q_{1},\ldots,q_{K}]^{T}$ are downlink and uplink power vectors, respectively111Lemma 1 can be easily extended to the case with non-identical noise power levels. More details can refer to [12]..

Note that problem (13) is an equivalent virtual problem of problem P1 whose optimal solutions are connected by ${\bf W}^{\ast}=\tilde{{\bf W}}^{\ast}{\bf P}^{\ast}$ where ${\bf P}^{\ast}=\text{diag}({\bf p}^{\ast})$ , ${\bf W}^{\ast}$ is the optimal solution to problem P1, and $\tilde{{\bf W}}^{\ast}$ and ${\bf p}^{\ast}$ are the optimal solutions to problem (13). Based on Lemma 1, we find that the uplink and downlink scenarios have the same achievable SINR region and the normalized beamforming designed for the uplink reception immediately carries over to the downlink transmission [12]. Thus we first obtain the optimal power allocation $\mathbf{q}^{\ast}$ and beamforming matrix $\tilde{{\bf W}}^{\ast}$ for the easier-to-solve uplink problem (14) instead of the downlink problem (13). Then given the optimal beamforming $\tilde{{\bf W}}^{\ast}$ , the optimal $\mathbf{p}^{\ast}$ is obtained as the first $K$ components of the dominant eigenvector of the following matrix [53]

[TABLE]

where $\bm{\sigma}=\sigma^{2}\mathbf{1}$ , $\mathbf{1}=[1,1,\ldots,1]^{T}\in\mathbb{R}^{K\times 1}$ , $\mathbf{D}=\text{diag}\{\rho_{1}/|(\tilde{{\bf w}}^{\ast}_{1})^{H}{\bf h}_{1}|^{2},\ldots,\rho_{K}/|(\tilde{{\bf w}}^{\ast}_{K})^{H}{\bf h}_{K}|^{2}\}$ , and

[TABLE]

Finally, the downlink beamforming matrix is derived as ${\bf W}^{\ast}=\tilde{{\bf W}}^{\ast}{\bf P}^{\ast}$ . Thus, instead of predicting ${\bf W}$ directly, we can predict the uplink power allocation vector $\mathbf{q}$ . In the supervised learning method, the prediction performance of the BNN depends on the quality of training samples. To generate the training samples, the optimal ${\bf q}^{\ast}$ and $\tilde{{\bf W}}^{\ast}$ can be found by an iterative optimization algorithm in [12, Table 1].

Note that $\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})$ is a non-negative matrix and the optimal objective value of problem P1 is the reciprocal of the largest eigenvalue of $\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})$ [53]. According to the Perron-Frobenius theory, for any nonnegative real matrix $\bm{\Omega}$ with spectral radius $\chi(\bm{\Omega})$ , there exist a vector $\bm{\delta}\geq 0$ such that $\bm{\Omega}\bm{\delta}=\chi(\bm{\Omega})\bm{\delta}$ [54]. Based on [12, Theorem 3], the sequence of the target value of problem P1 provided by the iterative algorithm in [12, Table 1] is strictly monotonically increasing and the largest eigenvalue of $\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})$ is unique. Then the corresponding eigenvector containing ${\bf q}$ is a continuous and bounded function of ${\bf h}$ according to [55, Chapter 3]. Thus, we can use a neural network to approximate the mapping function from ${\bf h}$ to ${\bf q}$ [43].

IV-B BNN Structure

The proposed BNN for problem P1, shown in Fig. 2, is based on the proposed BNN framework in Fig. 1. The functions and operations of the basic layers such as the input, convolutional, batch normalization, and output layers, are the same as those in the proposed framework. Therefore, we do not explain these layers here and readers can refer to Section III for detail. Note that in the proposed BNN for problem P1, the intermediate activation layers are fulfilled with the ReLU function whereas the last activation layer is implemented using the sigmoid function. Besides the existing layers in the framework, a scaling layer and a conversion layer are also introduced in the BNN for problem P1, which belong to the beamforming recovery module. In the following, we give the details of the scaling layer and the conversion layer.

IV-B1 Scaling Layer

Due to the existence of prediction error, it is almost impossible to guarantee that the output of the output layer always meets the power constraint in problem P1. According to [56], the optimal solution is achieved when the equality of the constraint in problem P1 holds. Therefore, we scale the results of the output layer $\hat{\mathbf{q}}$ to meet the power constraint by the following transformation,

[TABLE]

IV-B2 Conversion Layer

After receiving the scaled power allocation vector $\hat{\mathbf{q}}^{\ast}$ , we can achieve the downlink beamforming matrix $\hat{{\bf W}}^{\ast}$ as the final output of the BNN based on $\hat{{\bf q}}^{\ast}$ by the conversion layer. The beamforming recovery implemented by the conversion layer includes the following process:

Calculate $\mathbf{T}^{\ast}=\sigma^{2}{\bf I}_{N}+\sum^{K}_{k=1}\hat{q}^{\ast}_{k}{\bf h}_{k}{\bf h}^{H}_{k}$ . 2. 2.

Calculate $\tilde{{\bf w}}^{\ast}_{k}=\tilde{{\bf w}}^{\ast}_{k}/||\tilde{{\bf w}}^{\ast}_{k}||_{2},\forall k,$ where $\tilde{{\bf w}}^{\ast}_{k}=(\mathbf{T}^{\ast})^{-1}{\bf h}_{k}$ . 3. 3.

Find the maximal eigenvalue $\psi_{max}^{\ast}$ of $\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})$ and the associated eigenvector with respect to $\psi_{max}^{\ast}$ , i.e., $\bm{\Upsilon}(\tilde{{\bf W}}^{\ast},P_{max})\bigl{[}\begin{smallmatrix}\hat{{\bf p}}^{\ast}\\ 1\end{smallmatrix}\bigr{]}=\psi_{max}^{(i)}\bigl{[}\begin{smallmatrix}\hat{{\bf p}}^{\ast}\\ 1\end{smallmatrix}\bigr{]}$ . 4. 4.

Output $\hat{{\bf W}}^{\ast}=\tilde{{\bf W}}^{\ast}\hat{{\bf P}}^{\ast}$ as the final result where $\hat{{\bf P}}^{\ast}=\text{diag}(\hat{{\bf p}}^{\ast})$ .

Note that the time complexity of the beamforming recovery module is $\mathcal{O}(KN^{2}+N^{3}+K^{3})$ . In the proposed BNN for the SINR balancing problem P1, the supervised learning with the loss function based on the MSE metric is adopted.

V BNN for Power Minimization Problem

Similar to the BNN for the SINR balancing problem P1, the BNN for the power minimization problem P2 obtains the downlink beamforming matrix according to the uplink-downlink duality, i.e., the expert knowledge. Specifically, we first predict the uplink power allocation vector as the key features using the trained neural network, then obtain the normalized beamforming matrix based on the predicted results. Finally, the downlink beamforming matrix is recovered from the normalized beamforming matrix by the uplink-downlink conversion method.

V-A Uplink-Downlink Duality

Note that the conversion method adopted in the BNN for problem P1 can not be used again, because the power budget $P_{max}$ is unknown in the power minimization problem P2. Instead, we employ the conversion method in the following lemma [47].

Lemma 2.

Given the optimal beamforming matrix $\tilde{{\bf W}}^{\ast}=[\tilde{{\bf w}}^{\ast}_{1},\ldots,\tilde{{\bf w}}^{\ast}_{K}]$ for the uplink problem222In this work, for simplicity, we assume the solution to problem P2 always exists. However, it can happen that the wireless network only satisfies some of the users and thus the user selection is needed. To address this issue, a possible solution is to train another neural network for user selection, and then optimize the beamforming matrix among the selected users., i.e.,

[TABLE]

where $\gamma^{ul}_{k}(\tilde{{\bf W}},\mathbf{q})$ is given as in (16).

The optimal beamforming vectors ${\bf w}^{\ast}_{k},\forall k,$ for the downlink problem P2, can be obtained by multiplying the optimal normalized beamforming vector $\tilde{{\bf w}}^{\ast}_{k}$ by a scaling factor, i.e., ${\bf w}^{\ast}_{k}=p^{\ast}_{k}\tilde{{\bf w}}_{k}^{\ast},\forall k$ , where $p^{\ast}_{k}$ is the $k$ -th element of vector ${\bf p}^{\ast}=[p^{\ast}_{1},\ldots,p^{\ast}_{K}]^{T}\in\mathbb{R}^{K\times 1}$ and

[TABLE]

where

[TABLE]

The vector ${\bf p}^{\ast}$ of the scaling factors is the optimal downlink power allocation vector. Given the optimal normalized beamforming matrix $\tilde{{\bf W}}^{\ast}$ , Lemma 2 allows us to achieve the optimal downlink power vector ${\bf p}^{\ast}$ by (21), then ${\bf W}^{\ast}=\tilde{{\bf W}}^{\ast}{\bf P}^{\ast}$ . Actually, if we know the uplink power allocation vector ${\bf q}$ , the normalized beamforming matrix $\tilde{{\bf W}}$ can be inferred as

[TABLE]

where $\mathbf{T}=\sigma^{2}{\bf I}_{N}+\sum^{K}_{k=1}q_{k}{\bf h}_{k}{\bf h}^{H}_{k}$ . Therefore, the only results that need to be predicted by the BNN is the uplink power allocation vector ${\bf q}$ , which reduces significantly the computational complexity compared to the strategy that attempts to predict the beamforming matrix directly. The iterative algorithm in [5] provides a way to achieve the optimal ${\bf q}^{\ast}$ as the training samples in the supervised learning method. Besides, such an iterative algorithm suggests the mapping function from ${\bf h}$ to ${\bf q}$ is continuous [27, Theorem 1], so it can be approximated by a neural network.

V-B BNN Structure

The BNN for problem P2 in Fig. 3 is also based on the proposed BNN framework. However, the operations of the conversion layer in Fig. 3 are different from those in the BNN for problem P1. After receiving the uplink power allocation vector $\hat{{\bf q}}^{\ast}$ from the output layer, the beamforming recovery in the conversion layer performs the following operations:

Calculate $\mathbf{T}^{\ast}=\sigma^{2}{\bf I}_{N}+\sum^{K}_{k=1}\hat{q}^{\ast}_{k}{\bf h}_{k}{\bf h}^{H}_{k}$ . 2. 2.

Calculate $\tilde{{\bf w}}^{\ast}_{k}=\tilde{{\bf w}}^{\ast}_{k}/||\tilde{{\bf w}}^{\ast}_{k}||_{2},\forall k,$ where $\tilde{{\bf w}}^{\ast}_{k}=(\mathbf{T}^{\ast})^{-1}{\bf h}_{k}$ . 3. 3.

Calculate the downlink power allocation vector $\hat{{\bf p}}^{\ast}=\sigma^{2}(\bm{\Psi}^{\ast}(\tilde{{\bf W}}^{\ast},\bm{\Gamma}))^{-1}\mathbf{1}$ . 4. 4.

Output the downlink beamforming vectors $\hat{{\bf w}}^{\ast}_{k}=\hat{p}^{\ast}_{k}\tilde{{\bf w}}^{\ast}_{k},\forall k,$ as the final results.

Here, the time complexity of the beamforming recovery module is $\mathcal{O}(KN^{2}+N^{3}+K^{3})$ . Note that the predicted power vector $\hat{{\bf q}}^{\ast}$ by the BNN is, in general, not exact. The prediction error will lead to the inaccuracy of power allocation vector $\hat{{\bf p}}^{\ast}$ as well as the downlink beamforming $\hat{{\bf W}}^{\ast}$ . More specifically, if the predicted power vector $\hat{{\bf q}}^{\ast}$ has an acceptable accuracy with respect to the target power vector ${\bf q}^{\ast}$ , i.e., $||{\bf q}^{\ast}-\hat{{\bf q}}^{\ast}||^{2}_{2}<\varepsilon$ where $\varepsilon$ is a small constant, then we can obtain a suboptimal solution whose objective value is larger than that of the optimal solution, i.e., $\sum^{K}_{k=1}||\hat{{\bf w}}^{\ast}_{k}||^{2}_{2}>\sum^{K}_{k=1}||{\bf w}^{\ast}_{k}||^{2}_{2}$ . Intuitively, the extra power consumption $q_{extra}=\sum^{K}_{k=1}||\hat{{\bf w}}^{\ast}_{k}||^{2}_{2}-\sum^{K}_{k=1}||{\bf w}^{\ast}_{k}||^{2}_{2}$ can be regarded as the cost of the prediction error. However, if the predicted vector $\hat{{\bf q}}^{\ast}$ has a significant error, i.e., $||{\bf q}^{\ast}-\hat{{\bf q}}^{\ast}||^{2}_{2}\gg\varepsilon$ , the downlink beamforming $\hat{{\bf W}}^{\ast}$ inferred from the prediction $\hat{{\bf q}}^{\ast}$ may become infeasible since some elements of the vector $\hat{{\bf p}}^{\ast}$ have negative values. This suggests that different from problem P1, there is a certain probability of infeasibility of the BNN prediction for problem P2. However, our experiments show that the failure probability of the proposed BNN for problem P2 is lower than $1\%$ in most settings. More details will be given in Section VII. Moreover, the supervised learning with the loss function based on the MSE metric is adopted in the proposed BNN for problem P2.

VI BNN for Sum Rate Maximization Problem

Different from the SINR balancing problem P1 and the power minimization problem P2, no practically useful algorithm is available to find the optimal solution to the sum rate maximization problem P3, for which one cannot make use of uplink-downlink duality directly. However, we will exploit a connection between problems P2 and P3 to find some key features of the optimal solution to problem P3.

VI-A Solution Structure

A fact was mentioned in [57] that the optimal solution to problem P2, using the minimal amount of power to achieve the given SINR targets, must meet the power constraint in problem P3 to achieve the maximal sum rate. More specifically, given the optimal transmit power $P^{\star}$ of problem P2 and setting the total power constraint $P_{max}$ in problem P3 as $P^{\star}$ , the SINR values of each user in problem P3 can be calculated. By setting the SINR targets in problem P2 with these calculated SINR values, the solutions to problems P2 and P3 will be the same. According to the connection between problems P2 and P3, it has been pointed out in [2] that the optimal downlink beamforming vectors for problem P3 follows the structure as

[TABLE]

where $\lambda_{k}$ is a positive parameter and $\sum_{k=1}^{K}\lambda_{k}=\sum_{k=1}^{K}p_{k}=P_{max}$ according to the strong duality of problem P2. This is because $P_{max}$ is the optimal cost function in problem P2 and $\sum_{k=1}^{K}\lambda_{k}$ is the dual function. Note that the parameter vector $\bm{\lambda}=[\lambda_{1},\ldots,\lambda_{K}]^{T}$ can be considered as a virtual power allocation vector. The solution structure in (24) provides the required expert knowledge for the beamforming design in problem P3 and $\bm{\lambda}$ and ${\bf p}$ are the key features. But to our best knowledge, there is no low-complexity algorithm in the literature that can find the optimal $p^{\ast}_{k}$ and $\lambda^{\ast}_{k}$ in (24). An improved and faster branch-and-bound algorithm was developed in [51, 37] to find the globally optimal solution, but it is mostly effective for power control problems. The WMMSE algorithm is a good choice to find the locally optimal solutions [9, 10], and such an iterative algorithm ensures the continuity of the mapping from the channel to the solution, and can be learned by a neural network [27, 30]. Therefore, we can obtain the power allocation vectors ${\bf p}$ and $\bm{\lambda}$ according to the WMMSE algorithm. The supervised learning with the loss function based on the MSE metric will be first used to achieve as close to the results of the WMMSE algorithm as possible, i.e.,

[TABLE]

where $\underline{{\bf p}}^{(l)}$ and $\underline{\bm{\lambda}}^{(l)}$ are the power vectors obtained from the WMMSE algorithm, and $\hat{{\bf p}}^{(l)}$ and $\hat{\bm{\lambda}}^{(l)}$ are the predicted results of the BNN. It is worth pointing out that the results in the training samples of problems P1 and P2 are optimal, thus the MSE-based loss function is equivalent to the objective function and the supervised learning method updates network parameters towards the direction of the optimal solution. However, the WMMSE algorithm for problem P3 is locally optimal and thus (25) is not equivalent to the real objective of problem P3 which aims to maximize the weighted sum rate. To further improve the sum rate performance, we continue to train the BNN in an unsupervised learning way, whose loss function takes the objective function directly as a metric, i.e.,

[TABLE]

VI-B Hybrid BNN Structure

The BNN for problem P3 is presented in Fig. 4. The major difference from the BNNs in Figs. 2 and 3 is that the BNN in Fig. 4 has two stages of training. The first stage is responsible for pre-training using the supervised learning method with the loss function based on the MSE metric (25), while the second stage is responsible for enhanced training using the unsupervised learning method with the loss function whose metric is the objective function (26). Such a hybrid learning method of the supervised and unsupervised learning can significantly improve the learning performance and also accelerate convergence [29]. More specifically, the pre-training, as the approximation of WMMSE algorithm, starts with the random initialization of neural network parameters and the loss function (25). After the pre-training is finished, the neural network parameters are reserved and the loss function is replaced by (26), such that the second-stage training can achieve improved performance than the first-stage training.

Different from the BNNs in Figs. 2 and 3, the output layer in Fig. 4 generates $2K$ values including the power allocation vectors $\hat{{\bf p}}$ and $\hat{\bm{\lambda}}$ . Then the scaling layer scales the results of the output layer $\hat{{\bf q}}$ and $\hat{\bm{\lambda}}$ to meet the power constraint by the following method:

[TABLE]

Finally, the construction layer constructs the downlink beamforming vectors according to (24):

[TABLE]

Thus, the time complexity of the beamforming recovery module for problem P3 is $\mathcal{O}(KN^{2}+N^{3})$ .

VII Simulation Results

To evaluate the performance of the proposed BNNs, we carry out numerical simulations to compare the BNNs with several benchmark solutions (when available), including the optimal beamforming, the ZF beamforming [58], the RZF beamforming [59], and the WMMSE algorithm. We consider a downlink transmission scenario where the BS is equipped with $N=6$ antennas and its coverage is a disc with a radius of 500 m. There are $K=4$ single-antenna users and these users are distributed uniformly within the coverage of the BS. Note that none of these users is closer to the BS than 100 m. The channel of user $k$ is modelled as ${\bf h}_{k}=\sqrt{d_{k}}\tilde{{\bf h}}_{k}\in\mathbb{C}^{N\times 1}$ where $\tilde{{\bf h}}_{k}\sim\mathcal{CN}(\bm{0},\mathbf{I}_{N})$ is the small-scale fading [60] and $d_{k}=128.1+37.6\log_{10}(\omega)$ [dB] denotes the pathloss between user $k$ and the BS [61] with $\omega$ representing the distance in km. Here, shadow fading is omitted for simplicity. The noise power spectral density is $-174$ dBm/Hz and the total system bandwidth is 20 MHz. For simplicity, we assume all the sub-streams have the same importance and all the users have the same priority, i.e., $\rho_{k}=1,\forall k,$ and $\alpha_{k}=1,\forall k$ . Besides, perfect CSI is assumed to be available at the BS.

In our simulation, we prepare 20000 training samples and 5000 testing samples, respectively. The validation split is set to 0.2 and the training data is randomly shuffled at each epoch. All the BNNs have the same structure as shown in Table I. The fully-connected layer in the BNNs for problems P1 and P2 has $K$ neurons but that in the BNN for problem P3 has $2K$ neurons. The Glorot normal initializer [62] is used for weight initialization and biases are initialized to 0. Adam optimizer [63] is used with the MSE metric-based loss function. However, in the second stage of the BNN for problem P3, the metric of the loss function becomes the sum rate. The last activation layer is the sigmoid function so that the target output in the training and testing samples should be normalized into (0,1] by dividing a factor. Also, the channel coefficients are normalized by the noise power before being fed into the BNNs to avoid entering the insensitive area of the sigmoid function. Note that unless explicitly mentioned otherwise, all the neural network modules adopt the default setting in Table I and a separate neural network model is trained for each different case.

VII-A BNN for the SINR Balancing Problem

We first consider the BNN for the SINR balancing problem P1, which updates network parameters in a supervised learning way. The iterative algorithm in [12, Table 1] is used to generate the training and testing samples. The ZF beamforming is achieved by allocating power to make all the users have the same SINR value under a total power constraint. Fig. 5 shows the SINR performance averaged over 5000 samples in two cases: one only considering the small-scale fading but the other considering both the small-scale fading and large-scale fading. In both cases, the SINR performance of the proposed BNN solution is very close to that of the optimal solution [12]. It is observed that there is an obvious gap between the optimal solution and the ZF beamforming in the low normalized transmit-power ( $\frac{P_{max}}{\sigma^{2}}$ ) regime of Fig. 5 as well as the low transmit-power regime of Fig. 5. However, the gap decreases as the (normalized) transmit power increases.

To further compare the SINR performance of the optimal solution, the ZF beamforming, the RZF beamforming whose regularization parameter is set as $\frac{P_{max}}{K}$ , and the BNN solution, we evaluate the output SINR in Fig. 6 assuming that the number of users is the same as the number of BS antennas, i.e., $K=N$ , and they increase together. It is shown that the BNN solution has some performance loss compared to the optimal solution due to the estimation error, but the BNN solution always achieves a better performance than the ZF beamforming and RZF beamforming. This fact indicates the application prospect of the BNN: the computational complexity and time of the BNN solution is similar to those of the ZF beamforming and RZF beamforming, but is much lower than that of the optimal solution because the optimal solution relies on an iterative process. Besides, we also find that the SINR performance of the four solutions decrease as the transmit antenna number (user number) increases and among the four solutions the ZF beamforming suffers most from the performance loss.

Table II presents the comparison of two input formats, i.e., I/Q transformation and P/M transformation, in terms of the MSE performance and MAE performance of the predicted normalized power under the case with $K=N$ and $P_{max}=20$ dBm. As shown in Table II, I/Q transformation and P/M transformation have close performance.

In Fig. 7, we demonstrate the generality of the proposed BNN by fixing the user number as $K=4$ and the transmit power as $P_{max}=20$ dBm and show the SINR performance versus different transmit antenna settings. We train only a single BNN with { $K=4$ , $N=10$ }, but allow the number of transmit antennas to vary from 4 to 10 when using the trained BNN. Then the redundant entries at the inputs and outputs are filled with 0’s. It can be seen that these predicted results are very close to that of the optimal solution. This fact suggests the generality of the BNN, i.e., we can train a large BNN with more antennas which will also work for the cases with less antennas without re-training. This will be useful when some transmit antennas of the BS are malfunctioning or turned off.

VII-B BNN for the Power Minimization Problem

In this subsection, we consider the BNN for the power minimization problem P2, which also updates network parameters in a supervised learning way. The iterative algorithm in [5] is used to generate the training and testing samples. The ZF beamforming for comparison is achieved by minimizing the power for each user with a QoS constraint since there is no inter-user interference. We first investigate the effect of the SINR constraints of users on the power consumption. For convenience of comparison, we assume the SINR constraints of all users are the same, i.e. $\Gamma_{k}=\Gamma,\forall k$ . In Fig. 8, we compare the power performance of the optimal beamforming, the ZF beamforming, and the beamforming obtained by the BNN. Note that both Figs. 8 and 8 have two Y-axes where the left Y-axis is used to measure the (normalized) transmit power averaged over the feasible sample set of the BNN solution and the right Y-axis is used to show the feasibility of the BNN. As mentioned in Section V, the BNN may fail to find a feasible solution to problem P2 if the prediction error is unacceptable.

Figs. 8 and 8 present the (normalized) transmit power performance in the cases without and with consideration of the large-scale fading, respectively. In both cases, the (normalized) transmit power performance of the BNN solution is close to that of the optimal solution, and significantly outperforms the ZF beamforming in the low SINR-constraint regime which is higher than that of the optimal solution. We also find that, according to Fig. 8(b), the BNN solution performs slightly worse than the ZF solution when the SINR constraint is large, this is because the ZF solution becomes closer to the optimal solution as the SINR constraints increase, but the performance of the BNN solution is still close to that of the optimal solution. This fact suggests that when the SINR constraints are high, the ZF solution is a good choice instead of the BNN solution. Besides, we find that the feasibility of the BNN solution in both cases is more than 99.4%.

To further compare the BNN solution with the optimal solution and the ZF beamforming, we plot their power performance and execution time per sample in Figs. 9 and 9, respectively. Here, we consider two convergence strategies for the optimal iterative algorithm: the high convergence threshold ( $\varepsilon_{1}=10^{-2}$ ) which can be reached with less iterations and the low convergence threshold ( $\varepsilon_{2}=10^{-4}$ ) which requires more iterations for problem P2, i.e., $\frac{|\sum^{K}_{k=1}||\mathbf{w}^{(t-1)}_{k}||^{2}-\sum^{K}_{k=1}||\mathbf{w}^{(t)}_{k}||^{2}|}{\sum^{K}_{k=1}||\mathbf{w}^{(t-1)}_{k}||^{2}}\leq\varepsilon_{\kappa},\kappa\in\{1,2\}$ . In Fig. 9, the BS antenna number and SINR target of users are fixed as $N=8$ and $\Gamma=5$ dB. It is observed from Fig. 9 that as the user number $K$ increases, the performance gap between the ZF beamforming and the optimal beamforming with the low convergence threshold becomes large because more users share the array gain. The BNN solution, with the feasibility of up to 99%, shows a better performance than the ZF beamforming and the optimal iterative algorithm with the high convergence threshold. Fig. 9 demonstrates that compared to the optimal solution with the low convergence threshold, the BNN solution can reduce the execution time per sample by about two orders of magnitude, which is slightly longer than that of the ZF beamforming. This is because the BNN solution and the ZF beamforming are obtained without an iterative process, but the BNN needs to execute the neural network operations as well as the conversion process. We can reduce the iteration times using the high convergence threshold, but this leads to the power performance degradation. According to the results in Figs. 9 and 9, we can conclude that the BNN solution provides a good balance between the performance and computational complexity.

VII-C BNN for the Sum Rate Maximization Problem

In this subsection, we evaluate the performance of the BNN for the sum rate maximization problem P3 based on the proposed hybrid learning under the assumption that $K=4$ and $N=4$ . The ZF beamforming with $p_{k}=\frac{P_{max}}{K},\forall k$ and the RZF beamforming with $p_{k}=\lambda_{k}=\frac{P_{max}}{K},\forall k$ are introduced as two baseline solutions. Since the performance of the WMMSE algorithm heavily relies on initialization [9, 10], two different initialization methods, the RZF initialization and the random initialization, are considered and the WMMSE algorithm with the RZF initialization is used to generate samples for the supervised learning in the first stage. First, Fig. 10 shows the sum rate performance averaged over 5000 samples in two different cases: the former case in Fig. 10 only considers small-scale fading and and the latter case in Fig. 10 considers both small-scale fading and large-scale fading. It is shown that the sum rate performance of all solutions increases as the (normalized) transmit power increases and different initialization methods of the WMMSE algorithm have a large performance gap. We observe that in both cases the proposed BNN solution based on the hybrid learning always achieves a performance close to that of the WMMSE algorithm with the RZF initialization, while the performance of the supervised learning-based BNN solution is less satisfactory. This is because the second stage of the hybrid learning method aims to maximize the sum rate and its performance is bounded by the global optimal solution to problem P3. But the aim of the BNN solution based on the supervised learning is to achieve as close to the WMMSE solution as possible and its performance is restricted by the WMMSE solution, which is verified in Figs. 10 and 10.

We further compare the sum rate performance and the computational complexity, in terms of the execution time per sample, of five beamforming solutions in Figs. 11 and 11, respectively. The iteration number of the WMMSE algorithm is limited to at most 10. We fix the transmit power budget as $P_{max}=30$ dBm and assume the transmit antenna number is the same as the user number, i.e., $N=K$ . As the number of transmit antennas increases, the sum rate performance of all five solutions increases simultaneously. The performance of the proposed BNN solution based on the hybrid learning method is always close to that of the WMMSE algorithm with the RZF initialization, but is superior to those of the other four solutions and the performance gap becomes larger when the number of the transmit antenna increases. According to Fig. 11, the execution time per sample of the BNN solutions based on the supervised learning and hybrid learning methods is at the same level, which is slightly longer than that of the ZF beamforming and the RZF beamforming, for the same reason of Fig. 9(b). As expected, the WMMSE algorithm consumes the most time because of its iterative process. Similar to the other proposed BNNs, it proves that the proposed BNN solution to the sum rate problem P3 provides a good balance between the performance and computational complexity.

VIII Conclusions

In this paper, we proposed a DL-based framework for fast optimization of the beamforming vectors in the MISO downlink and then devised three BNNs under this framework for the SINR balancing problem under a total power constraint, the power minimization problem under individual QoS constraints, and the sum rate maximization problem under a total power constraint, respectively. The proposed BNNs are based on the CNN structure and expert knowledge. The supervised learning method was adopted for the SINR balancing problem and the power minimization problem because effective algorithms are available for generating training samples. However, there is no practically useful algorithm to find the optimal solution to the nonconvex sum rate maximization problem, therefore the corresponding BNN adoptes a hybrid learning method which first pre-trains the neural network based on the supervised learning method, and then updates the network parameters with the unsupervised learning method to further improve learning performance. Furthermore, in order to reduce the complexity of prediction, the proposed BNNs take advantage of expert knowledge to extract key features instead of predicting beamforming matrix directly. Simulation results demonstrated that the proposed BNN solutions provided a good balance between the performance and complexity, compared to the existing algorithms.

This work is an attempt to apply the DL technique to beamforming optimization. Actually, a lot of extension works are worth further study. For example, it is unclear so far which input format, I/Q transformation or P/M transformation, is better. In addition, the joint optimization of user selection and beamforming design for the power minimization problem is interesting and it deserves more investigation. Besides, user mobility, machine-type communications, imperfect CSI, feasibility detection, and multi-cell scenarios are also interesting extensions for future works.

Acknowledgment

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Bibliography63

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. Petropulu, “Deep learning based beamforming neural networks in downlink MISO systems,” in Proc. IEEE Int. Conf. Commun. (ICC) Workshop , Shanghai, China, May 2019, pp. 1–5.
2[2] E. Björnson, M. Bengtsson, and B. Ottersten, “Optimal multiuser transmit beamforming: A difficult problem with a simple solution structure,” IEEE Signal Process. Mag. , vol. 31, no. 4, pp. 142–148, Jul. 2014.
3[3] H. Boche and M. Schubert, “A general duality theory for uplink and downlink beamforming,” in Proc. IEEE Conf. Veh. Technol. Conf. (VTC) , vol. 1, Vancouver, Canada, Sep. 2002, pp. 87–91.
4[4] D. Gerlach and A. Paulraj, “Base station transmitting antenna arrays for multipath environments,” Signal Process. , vol. 54, no. 1, pp. 59–73, Oct. 1996.
5[5] Q. Shi, M. Razaviyayn, M. Hong, and Z. Luo, “SINR constrained beamforming for a MIMO multi-user downlink system: Algorithms and convergence analysis,” IEEE Trans. Signal Process. , vol. 64, no. 11, pp. 2920–2933, Jun. 2016.
6[6] F. Rashid-Farrokhi, K. R. Liu, and L. Tassiulas, “Transmit beamforming and power control for cellular wireless systems,” IEEE J. Sel. Areas Commun. , vol. 16, no. 8, pp. 1437–1450, Oct. 1998.
7[7] A. B. Gershman, N. D. Sidiropoulos, S. Shahbazpanahi, M. Bengtsson, and B. Ottersten, “Convex optimization-based beamforming,” IEEE Signal Process. Mag. , vol. 27, no. 3, pp. 62–75, May 2010.
8[8] A. Wiesel, Y. C. Eldar, and S. Shamai, “Linear precoding via conic optimization for fixed MIMO receivers,” IEEE Trans. Signal Process. , vol. 54, no. 1, pp. 161–176, Jan. 2006.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Deep Learning Framework for Optimization of MISO Downlink Beamforming

Abstract

Index Terms:

I Introduction

II System Model

III A DL-based Framework for Beamforming Optimization

III-A Structure of the Proposed Framework

III-A1 Input Layer

III-A2 Convolutional Layer

III-A3 Batch Normalization Layer

III-A4 Activation Layer

III-A5 Flatten Layer, Fully-connected Layer, and Output Layer

III-A6 Beamforming Recovery Module

III-B Computational Complexity

IV BNN for SINR Balancing Problem

IV-A Uplink-Downlink Duality

Lemma 1**.**

IV-B BNN Structure

IV-B1 Scaling Layer

IV-B2 Conversion Layer

V BNN for Power Minimization Problem

V-A Uplink-Downlink Duality

Lemma 2**.**

V-B BNN Structure

VI BNN for Sum Rate Maximization Problem

VI-A Solution Structure

VI-B Hybrid BNN Structure

VII Simulation Results

VII-A BNN for the SINR Balancing Problem

VII-B BNN for the Power Minimization Problem

VII-C BNN for the Sum Rate Maximization Problem

VIII Conclusions

Acknowledgment

Lemma 1.

Lemma 2.