Optimal WDM Power Allocation via Deep Learning for Radio on Free Space   Optics Systems

Zhan Gao; Mark Eisen; Alejandro Ribeiro

arXiv:1906.09981·eess.SP·June 25, 2019·GLOBECOM

Optimal WDM Power Allocation via Deep Learning for Radio on Free Space Optics Systems

Zhan Gao, Mark Eisen, Alejandro Ribeiro

PDF

Open Access

TL;DR

This paper introduces a deep learning-based approach for optimal power allocation in WDM Radio on Free Space Optics systems, enhancing capacity while respecting power and safety constraints.

Contribution

It develops a model-free primal-dual deep learning algorithm for power allocation, outperforming traditional equal allocation methods.

Findings

01

Deep learning algorithm achieves higher capacity than equal power allocation.

02

Model-free approach does not require system knowledge.

03

Significant performance improvements demonstrated through simulations.

Abstract

Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization problem with two constraints of total power limitation and eye safety concern. The model-based Stochastic Dual Gradient algorithm is presented first, which solves the problem exactly by exploiting the null duality gap. The model-free Primal-Dual Deep Learning algorithm is then developed to learn and optimize the power allocation policy with Deep Neural Network (DNN) parametrization, which can be utilized without any knowledge of system models. Numerical simulations are performed to exhibit…

Equations52

E_{h} [i = 1 \sum m P_{i} (h)] \leq P_{T} .

E_{h} [i = 1 \sum m P_{i} (h)] \leq P_{T} .

0 \leq P_{i} (h) \leq P_{S}, i = 1, ..., m .

0 \leq P_{i} (h) \leq P_{S}, i = 1, ..., m .

P :=

P :=

\displaystyle\operatornamewithlimits{s.t.}\

0 \leq P_{i} (h) \leq P_{S}, i = 1, ..., m .

h_{a} = A (d, λ) e^{- α d}, A (d, λ) = \frac{A _{T X} A _{R X}}{( d λ ) ^{2}},

h_{a} = A (d, λ) e^{- α d}, A (d, λ) = \frac{A _{T X} A _{R X}}{( d λ ) ^{2}},

y = h_{a} h_{t} x + n,

y = h_{a} h_{t} x + n,

h = \frac{∣ h _{a} h _{t} ∣ ^{2}}{N _{0}} .

h = \frac{∣ h _{a} h _{t} ∣ ^{2}}{N _{0}} .

C N R = \frac{\frac{1}{2} ( O M I \cdot m _{p} r P _{r} ) ^{2}}{R I N \cdot ( r P _{r} ) ^{2} + 2 e m _{p}^{2 + F} r P _{r} + \frac{4 K T}{R _{f}}},

C N R = \frac{\frac{1}{2} ( O M I \cdot m _{p} r P _{r} ) ^{2}}{R I N \cdot ( r P _{r} ) ^{2} + 2 e m _{p}^{2 + F} r P _{r} + \frac{4 K T}{R _{f}}},

C_{i} (P, h) = lo g (1 + C N R_{i} (P, h)) = lo g (1 + C N R_{i} (P_{i}, h_{i})) = lo g (1 + \frac{\frac{1}{2} ( O M I \cdot m _{p} r P _{i} h _{i} ) ^{2}}{R I N \cdot ( r P _{i} h _{i} ) ^{2} + 2 e m _{p}^{2 + F} r P _{i} h _{i} + \frac{4 K T}{R _{f}}}) .

C_{i} (P, h) = lo g (1 + C N R_{i} (P, h)) = lo g (1 + C N R_{i} (P_{i}, h_{i})) = lo g (1 + \frac{\frac{1}{2} ( O M I \cdot m _{p} r P _{i} h _{i} ) ^{2}}{R I N \cdot ( r P _{i} h _{i} ) ^{2} + 2 e m _{p}^{2 + F} r P _{i} h _{i} + \frac{4 K T}{R _{f}}}) .

L (P (h), λ) = i = 1 \sum m ω_{i} E_{h} [lo g (1 + C N R_{i} (P_{i} (h), h_{i}))] + λ (P_{T} - E_{h} [i = 1 \sum m P_{i} (h)]) .

L (P (h), λ) = i = 1 \sum m ω_{i} E_{h} [lo g (1 + C N R_{i} (P_{i} (h), h_{i}))] + λ (P_{T} - E_{h} [i = 1 \sum m P_{i} (h)]) .

D (λ) = P (h) \in P max L (P (h), λ) .

D (λ) = P (h) \in P max L (P (h), λ) .

D = λ \geq 0 min D (λ) = λ \geq 0 min P (h) \in P max L (P (h), λ) .

D = λ \geq 0 min D (λ) = λ \geq 0 min P (h) \in P max L (P (h), λ) .

P = D .

P = D .

P^{k + 1} (h) = P (h) \in P argmax L (P (h), λ^{k}) = P (h) \in P argmax i = 1 \sum m ω_{i} E_{h} [lo g (1 + C N R_{i} (P_{i} (h), h_{i}))] + λ^{k} (P_{T} - E_{h} [i = 1 \sum m P_{i} (h)]) .

P^{k + 1} (h) = P (h) \in P argmax L (P (h), λ^{k}) = P (h) \in P argmax i = 1 \sum m ω_{i} E_{h} [lo g (1 + C N R_{i} (P_{i} (h), h_{i}))] + λ^{k} (P_{T} - E_{h} [i = 1 \sum m P_{i} (h)]) .

P_{i}^{k + 1} (h) = P_{i} (h) \in [0, P_{S}] argmax ω_{i} lo g (1 + C N R_{i} (P_{i} (h), h_{i})) - λ^{k} P_{i} (h) .

P_{i}^{k + 1} (h) = P_{i} (h) \in [0, P_{S}] argmax ω_{i} lo g (1 + C N R_{i} (P_{i} (h), h_{i})) - λ^{k} P_{i} (h) .

λ^{k + 1} = [λ^{k} - η^{k} \nabla_{λ} L (θ^{k + 1}, λ^{k})]_{+} = [λ^{k} - η^{k} (P_{T} - E_{h} [i = 1 \sum m P_{i}^{k + 1} (h)])]_{+},

λ^{k + 1} = [λ^{k} - η^{k} \nabla_{λ} L (θ^{k + 1}, λ^{k})]_{+} = [λ^{k} - η^{k} (P_{T} - E_{h} [i = 1 \sum m P_{i}^{k + 1} (h)])]_{+},

P_{i}^{*} (h) = P_{i} (h) \in [0, P_{S}] argmax ω_{i} lo g (1 + C N R_{i} (P_{i} (h), h_{i})) - λ^{*} P_{i} (h) .

P_{i}^{*} (h) = P_{i} (h) \in [0, P_{S}] argmax ω_{i} lo g (1 + C N R_{i} (P_{i} (h), h_{i})) - λ^{*} P_{i} (h) .

P (h) = Φ (h, θ) .

P (h) = Φ (h, θ) .

x_{l} = σ_{l} (Π_{l} x_{l - 1}) .

x_{l} = σ_{l} (Π_{l} x_{l - 1}) .

L (θ, λ) = i = 1 \sum m ω_{i} E_{h} [C_{i} (Φ (h, θ), h)] + λ (P_{T} - E_{h} [i = 1 \sum m Φ_{i} (h, θ)]) .

L (θ, λ) = i = 1 \sum m ω_{i} E_{h} [C_{i} (Φ (h, θ), h)] + λ (P_{T} - E_{h} [i = 1 \sum m Φ_{i} (h, θ)]) .

D_{θ} = λ \geq 0 min D_{θ} (λ) = λ \geq 0 min θ \in Θ max L (θ, λ) .

D_{θ} = λ \geq 0 min D_{θ} (λ) = λ \geq 0 min θ \in Θ max L (θ, λ) .

θ^{k + 1} = θ^{k} + δ^{k} \nabla_{θ} L (θ^{k}, λ^{k}) = θ^{k} + δ^{k} \nabla_{θ} E_{h} [i = 1 \sum m ω_{i} C_{i} (Φ (h, θ^{k}), h) + λ^{k} (P_{T} - i = 1 \sum m Φ_{i} (h, θ^{k}))],

θ^{k + 1} = θ^{k} + δ^{k} \nabla_{θ} L (θ^{k}, λ^{k}) = θ^{k} + δ^{k} \nabla_{θ} E_{h} [i = 1 \sum m ω_{i} C_{i} (Φ (h, θ^{k}), h) + λ^{k} (P_{T} - i = 1 \sum m Φ_{i} (h, θ^{k}))],

λ^{k + 1} = [λ^{k} - η^{k} (P_{T} - E_{h} [i = 1 \sum m Φ_{i} (h, θ^{k + 1})])]_{+} .

λ^{k + 1} = [λ^{k} - η^{k} (P_{T} - E_{h} [i = 1 \sum m Φ_{i} (h, θ^{k + 1})])]_{+} .

\nabla_{θ} E_{h} [f (Φ (h, θ), h)] = E_{h, P} [f (P, h) \nabla_{θ} lo g π_{h, θ} (P)],

\nabla_{θ} E_{h} [f (Φ (h, θ), h)] = E_{h, P} [f (P, h) \nabla_{θ} lo g π_{h, θ} (P)],

\nabla_{θ} E_{h} [f (Φ (h, θ), h)] = \frac{1}{S} j = 1 \sum S f (P_{j}, h_{j}) \nabla_{θ} lo g π_{h_{j}, θ} (P_{j}),

\nabla_{θ} E_{h} [f (Φ (h, θ), h)] = \frac{1}{S} j = 1 \sum S f (P_{j}, h_{j}) \nabla_{θ} lo g π_{h_{j}, θ} (P_{j}),

\nabla_{θ} L (θ, λ) = \nabla_{θ} E_{h} [i = 1 \sum m ω_{i} C_{i} (Φ (h, θ), h) + λ (P_{T} - i = 1 \sum m Φ_{i} (h, θ))] = \frac{1}{S} j = 1 \sum S {[i = 1 \sum m ω_{i} C_{i} (P_{j}, h_{j}) + λ (P_{T} - i = 1 \sum m P_{j, i})] \nabla_{θ} lo g π_{h_{j}, θ} (P_{j})} .

\nabla_{θ} L (θ, λ) = \nabla_{θ} E_{h} [i = 1 \sum m ω_{i} C_{i} (Φ (h, θ), h) + λ (P_{T} - i = 1 \sum m Φ_{i} (h, θ))] = \frac{1}{S} j = 1 \sum S {[i = 1 \sum m ω_{i} C_{i} (P_{j}, h_{j}) + λ (P_{T} - i = 1 \sum m P_{j, i})] \nabla_{θ} lo g π_{h_{j}, θ} (P_{j})} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Wireless Communication Technologies · Advanced Optical Network Technologies · Satellite Communication Systems

Full text

\NewEnviron

scaletikzpicturetowidth[1]\BODY

Optimal WDM Power Allocation via Deep Learning

for Radio on Free Space Optics Systems††thanks: Supported by ARL DCIST CRA W911NF-17-2-0181 and Intel Science and Technology Center for Wireless Autonomous Systems.

Zhan Gao Mark Eisen Alejandro Ribeiro

Department of Electrical and System Engineering, University of Pennsylvania, Philadelphia, USA

Email: {gaozhan, maeisen, aribeiro}@seas.upenn.edu

Abstract

Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization problem with two constraints of total power limitation and eye safety concern. The model-based Stochastic Dual Gradient algorithm is presented first, which solves the problem exactly by exploiting the null duality gap. The model-free Primal-Dual Deep Learning algorithm is then developed to learn and optimize the power allocation policy with Deep Neural Network (DNN) parametrization, which can be utilized without any knowledge of system models. Numerical simulations are performed to exhibit significant performance of our algorithms compared to the average equal power allocation.

Index Terms:

Radio on free space optics, deep learning, wavelength division multiplexing, power allocation

I Introduction

Modern society has witnessed the appearance of heterogeneous wireless services. These services demand different facilities and operate their own networks respectively, which leads to the high cost and slows down the process of deploying new wireless services. Radio over Fiber (RoF) is put forward as a universal platform that connects multiple radio frequency (RF) signals from different wireless access networks. By placing RF signals on optical carriers, RoF system transmits them through optical fibers without changing their radio formats. It takes advantage of high data-rates, low loss and zero interference, but heavily relies on the deployment of fibers, which may not be available in places like rural areas [1].

Free Space Optical (FSO) communication becomes a promising alternative when fibers are not available. With similar advantages as optical fiber communication, it also enjoys license free, easy and inexpensive setup [2, 3]. Wireless FSO links get rid of physical restriction of fiber deployment, and is able to transmit RF signals through free space. Therefore, so-called Radio on Free Space Optics (RoFSO) system has been developed recently [4]. However, RoFSO can be seriously affected by FSO channel characteristics, such as weather, turbulence, etc. Different models are proposed for the FSO channel and various techniques are developed to reduce its influence [3, 5, 6, 7].

To improve the performance, Dense Wavelength Division Multiplexing (DWDM) RoFSO system has been developed, as a means of transmitting multiple RF signals simultaneously. It makes it feasible to employ Wavelength Division Multiplexing (WDM) in RoFSO [8, 9]. At the same time, adaptive transmission based on the channel state information (CSI) is proposed to help mitigate channel effects for FSO [10, 11] and RoFSO systems [12, 13].

The problem considered in this paper is the optimal power allocation for adaptive WDM transmission in RoFSO systems. According to CSI of all wavelength links, different powers are assigned to different wavelengths to maximize the objective function, subject to power limitation constraints necessary for safe implementation of RoFSO systems. The problem is challenging not only because it is both non-convex and constrained, but also because the mathematical system model or estimated CSI may not be accurate in practice. Some model-based algorithms have been developed to handle similar problems [12, 13]. These algorithms both employ relaxations to find inexact solutions and are computationally expensive to implement [13]. The inherent difficulty makes the application of machine learning appealing, due to both their low complexity and potential for model-free implementation. Deep learning in particular has been applied for resource allocation problems in wireless RF domain in both supervised [14] and unsupervised [15, 16] manners. Such approaches have not yet been explored in FSO or RoFSO systems.

This paper develops two algorithms to solve power allocation for WDM RoFSO. We first formulate the optimal design problem and introduce the RoFSO system model (Section II). We present the Stochastic Dual Gradient algorithm to solve the problem exactly using the idea of strong duality in [17] (Section III). This approach is limited in practice as it is dependent upon system models and requires more computational expense. As a model-free and low complexity alternative, we leverage machine learning techniques in the Primal-Dual Deep Learning algorithm (Section IV). In particular, Deep Neural Networks (DNNs) are used to parameterize the power allocation policy, which are trained with a primal-dual method to solve the resulting constrained learning problem. A model-free implementation is employed using the policy gradient method for cases in which system models are inaccurate or unknown. The strong performance of both algorithms are shown by numerical simulations (Section V).

II Problem Formulation

Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, can transmit RF signals through FSO links in optical networks. The developed Dense Wavelength Division Multiplexing (DWDM) RoFSO system enables the simultaneous transmission of multiple RF signals with WDM technique to increase transmission capacity. Specifically, multimedia RF signals are accessed into RoFSO system and placed on multiple optical wavelength carriers with optoelectronic devices, and then transmitted into free space. At the receiver, optical signals are received through FSO channels, and transferred back to RF signals for users.

The adaptive transmission is considered when allocating powers to wavelength channels in RoFSO. Based on the channel state information (CSI), different powers are assigned to different wavelengths to maximize the objective function. The exact objective function can be adjusted according to specific situations.

Assume there are $m$ optical wavelengths carrying different transmissions, and each of them are non-overlapping with enough spacing. The CSI is represented by the vector ${\mathbf{h}}=[h_{1},...,h_{m}]$ , where each $h_{i}(i=1,...,m)$ donates the CSI of $i$ -th wavelength channel. The allocated power to signal transmitted on the $i$ -th wavelength is based upon observed CSI ${\mathbf{h}}$ via a power allocation policy $P_{i}({\mathbf{h}})$ . Given the collection of power allocations ${\mathbf{P}}({\mathbf{h}})=[P_{1}({\mathbf{h}}),...,P_{m}({\mathbf{h}})]$ and current CSI ${\mathbf{h}}$ , a channel capacity of $C_{i}({\mathbf{P}}({\mathbf{h}}),{\mathbf{h}})$ is achieved on the $i$ -th wavelength. Note that FSO channel is considered as a fading process with channel coherence time on the order of milliseconds, so we can assume an ergodic and i.i.d block fading process. Since the instantaneous channel capacity tends to vary fast, a long term average $\mathbb{E}_{\mathbf{h}}[C_{i}({\mathbf{P}}({\mathbf{h}}),{\mathbf{h}})]$ is the more meaningful metric to consider. Additionally, because different wireless services accessed into RoFSO may have different priorities, we consider the weight vector $\bm{\omega}=[\omega_{1},...,\omega_{m}]\geq 0$ to represent such priorities.

There are two natural constraints to be considered in RoFSO power allocation. The first is the expected total power limitation $P_{T}$ for the FSO base station:

[TABLE]

The second is motivated by the eye safety concern in optical transmissions. Specifically, we set a peak power $P_{S}$ that can be allocated on any single wavelength so that the beam is not dangerous for human eyes in its propagation:

[TABLE]

Together, we formulate the optimal power allocation for adaptive WDM transmission in RoFSO systems as the following statistical optimization problem:

[TABLE]

Note that the above problem is formulated without any specific system model. In the proceeding subsection, we discuss channel and capacity models commonly used in the study of RoFSO systems. We then present an exact algorithm to solve (3) in Section III that relies on such model information, as well as a deep-leraning based alternative algorithm in Section IV that does not.

II-A System Model

To mathematically study the FSO channel and RoFSO system, some theoretical models have been put forward in previous researches. For the FSO channel, its effects mainly consist of two parts: the attenuation $h_{a}$ and the turbulence $h_{t}$ [18].

The attenuation fading term $h_{a}$ can be expressed by

[TABLE]

where $\alpha$ is the attenuation coefficient; $d$ is the transmission distance; $\lambda$ is the wavelength; $A_{TX}$ is the aperture area of transmitter, and $A_{RX}$ is the aperture area of receiver.

As for the turbulence, we use the well-known Log-normal distribution to model the fading term $h_{t}$ , which is considered to be accurate under weak-to-moderate turbulence. Without loss of generality, we can also use other distributions like Gamma-gamma distribution according to different turbulence conditions [5, 19].

The FSO channel can then be modelled as

[TABLE]

in which $y$ is the received signal; $x$ is the transmitted signal, and $n$ represents the additive Gaussian noise. Therefore, the channel gain (referred as CSI) under this model is expressed by

[TABLE]

In terms of the RoFSO system with APD photo detector, its performance is commonly evaluated by Carrier to Noise Ratio (CNR), which is modelled as [20]

[TABLE]

where $OMI$ donates the optical modulation index; $RIN$ donates the relative intensity noise; $m_{p}$ is the photodiode gain; $r$ is the photodiode responsivity; $e$ is the electric charge; $F$ is the excess noise factor; $K$ is the Boltzmann’s constant; $T$ is the temperature; $R_{f}$ is the photodiode resistance, and $P_{r}=Ph$ is the received power at the detector.

With this specific RoFSO system model, the capacity of $i$ -th wavelength channel with allocated power ${\mathbf{P}}$ and CSI ${\mathbf{h}}$ can be expressed by

[TABLE]

Note that system parameters are assumed to be same for all wavelength channels.

III Stochastic Dual Gradient Algorithm

Solving the above optimization problem (3) is challenging due to its non-concave capacity function, functional optimization complexity and the existence of constraints. We first address these challenges by establishing a null duality gap property of (3) and subsequently presenting the Stochastic Dual Gradient (SDG) algorithm to solve. First, for the development of SDG algorithm, we assume that models given in Section II-A are accurate, i.e. the capacity function $C_{i}({\mathbf{P}}({\mathbf{h}}),{\mathbf{h}})$ in (3) can be computed as in (8).

With two constraints in (3), it is natural to think about working in the dual domain. Let $\mathcal{P}=[0,P_{S}]^{m}$ represent the space satisfying the eye safety concern, and introduce the dual variable $\lambda\geq 0$ . The Lagrangian of the problem is given by

[TABLE]

The dual function is then defined as

[TABLE]

Its corresponding dual problem is to find $\lambda^{*}$ that minimizes the dual function

[TABLE]

However, the objective function is non-concave and complicated due to the term $CNR_{i}(P_{i}({\mathbf{h}}),h_{i})$ , which leads it to be a non-convex optimization problem. Solving it in the dual domain then seems to be impossible in principle. Nevertheless note that the key reason here to make the dual method impractical is not the non-convex property but the existence of duality gap indeed, which indicates the loss of optimality if using the dual method. In other words, as long as we can show that this problem does have null duality gap, it is then feasible to be solved in the dual domain.

Observe that the non-concave objective function is actually inside the expectation expression. We then give the following Theorem 1 according to [17] to show its null duality gap:

Theorem 1

Assume $\mathbb{P}$ and $\mathbb{D}$ donate the optimal solution value of the primal problem (3) and its corresponding dual problem (11). If there exists a feasible point ${\mathbf{P}}_{0}$ satisfying all constraints with strict inequality, and the probability distribution of CSI ${\mathbf{h}}$ contains no point of positive probability, then the duality gap is null:

[TABLE]

The problem in our case satisfies all conditions of Theorem 1 such that the duality gap is null even if it is a non-convex optimization problem. Then we can directly develop the dual methodology to solve (3) by solving (11) without any relaxation.

The SDG algorithm is put forward based on the above analysis, which iteratively searches for the optimal dual variable $\lambda^{*}$ from initial $\lambda^{0}$ and use $\lambda^{*}$ to compute the optimal power allocation ${\mathbf{P}}^{*}({\mathbf{h}})$ . Specifically, at each iteration $k$ , SDG consists of two steps:

(1) Primal variable update. For the given $\lambda^{k}$ from iteration $k-1$ and CSI ${\mathbf{h}}$ , we update the primal variable by maximizing the Lagrangian:

[TABLE]

Furthermore, both the objective function and constraints separate the use of $P_{1}({\mathbf{h}}),...,P_{m}({\mathbf{h}})$ and $h_{1},...,h_{m}$ , with no coupling between them. (13) can be simplified to

[TABLE]

(2) Dual variable update. With ${\mathbf{P}}^{k+1}({\mathbf{h}})$ gotten from step (1), we then perform a dual descent method to get $\lambda^{k+1}$ :

[TABLE]

where $\eta^{k}$ is the stepsize of $\lambda$ at iteration $k$ , and $[\cdot]_{+}$ is due to the non-negativity of $\lambda$ . The expectation $\mathbb{E}_{{\mathbf{h}}}[\cdot]$ is computed by the stochastic method with $S$ samples of ${\mathbf{h}}$ .

By repeating the above two steps recursively, as $k$ increases, $\lambda^{k}$ converges to the optimal value $\lambda^{*}$ , and the optimal allocated power of $i$ -th wavelength channel $P^{*}_{i}({\mathbf{h}})$ is given by

[TABLE]

With the knowledge of accurate system model (8) and SDG algorithm, we can solve the problem (3) perfectly. However, in practice, there exists several problems to discuss:

SDG algorithm heavily depends on the exact system model, which means that we need the accurate knowledge of model (8) to perform this algorithm. However, due to the complexity of RoFSO systems, such models may not be accurate in practice.
CSI ${\mathbf{h}}$ needs to be estimated at the receiver and feedback to the transmitter. However, the feedback estimated $\widehat{{\mathbf{h}}}$ used in SDG has errors with real ${\mathbf{h}}$ used in the objective capacity function $C_{i}({\mathbf{P}},{\mathbf{h}})$ , which degrades the performance of SDG.
In step (1) of SDG, there is not a closed-form solution to the maximization problem to get optimal $P_{i}^{k+1}({\mathbf{h}})$ , and thus requires time to numerically solve it for each iteration.

These three problems inspires the use of model-free and low-complexity learning algorithms to solve the power allocation problem.

IV Primal-Dual Deep Learning Algorithm

To handle the above limitations of SDG algorithm, we develop the model-free Primal-Dual Deep Learning (PDDL) algorithm, which does not directly use system models but only observed capacity and CSI values. Note that our optimization problem (3) shares the same structure with statistical learning problem. This inspires us to introduce a parametrization $\bm{\theta}\in\mathbb{R}^{q}$ to represent the power allocation policy ${\mathbf{P}}({\mathbf{h}})$ by

[TABLE]

Substitute (17) into the problem (3), our purpose then becomes to learn an optimal function $\bm{\Phi}^{*}({\mathbf{h}},\bm{\theta}^{*})$ with optimal parametrization $\bm{\theta}^{*}$ , which outputs allocated powers ${\mathbf{P}}^{*}$ that maximize the objective function.

In terms of the parametrization, a good choice of $\bm{\Phi}({\mathbf{h}},\bm{\theta})$ should provide an accurate approximation for almost any function by changing its parameters $\bm{\theta}$ , which can greatly improve the learning performance. Deep Neural Networks (DNNs), widely used in modern machine learning problems, are known to exhibit such strong function approximation ability almost perfectly [21]. Thus, DNN is a good candidate to be used here. We briefly introduce the architecture of DNN. Assume there are $L$ layers in DNN with $n_{1},...n_{L}$ donating the number of layer units respectively. Each layer is comprised of two parts: linear transform matrix $\bm{\Pi}_{l}\in\mathbb{R}^{n_{l}\times n_{l-1}}$ and non-linear operator $\sigma_{l}$ . The output of $l$ -th layer ${\mathbf{x}}_{l}\in\mathbb{R}^{n_{l}}$ can then be obtained by its input ${\mathbf{x}}_{l-1}\in\mathbb{R}^{n_{l-1}}$ :

[TABLE]

Note that the input of DNN ${\mathbf{x}}_{0}$ is the CSI ${\mathbf{h}}$ , and the parametrization $\bm{\theta}$ is the matrices $\{\bm{\Pi}_{l}\}_{i=1,...,L}$ . As for the non-linear operator $\sigma$ , various functions can be used, such as ReLu or sigmoid. Besides, note that $\bm{\theta}$ should belong to the set $\Theta=\{\bm{\theta}|\bm{\Phi}({\mathbf{h}},\bm{\theta})\in\mathcal{P}\}$ to satisfy the eye safety concern.

Similar as (9), the Lagrangian here can be expressed by

[TABLE]

And its corresponding dual problem becomes

[TABLE]

For the above min-max problem with sufficient dense DNN parametrization $\bm{\theta}$ , the duality gap between $\mathbb{P}$ and $\mathbb{D}_{\bm{\theta}}$ is proportional to the function approximation ability of DNN [16]. Therefore, due to the strong ability of DNN, the duality gap is nearly null. We then develop the PDDL learning algorithm based on (20), which updates primal variable $\bm{\theta}$ and dual variable $\lambda$ simultaneously at every iteration using first order gradients. The ultimate purpose is to search for a local stationary point $(\bm{\theta}^{*},\lambda^{*})$ which satisfies KKT conditions. Specifically, at each iteration $k$ , we follow two steps:

(1) Primal variable update. For a given $\lambda^{k}$ from iteration $k-1$ and CSI ${\mathbf{h}}$ , we update the primal variable $\bm{\theta}$ by

[TABLE]

where $\delta_{k}$ is the stepsize of $\bm{\theta}$ at iteration $k$ , and the last equation is because of the linearity of the expectation.

(2) Dual variable update. Once we get $\bm{\theta}^{k+1}$ , the dual variable $\lambda$ is updated by a similar way

[TABLE]

Observed from (21), the update of primal variable requires not only computing the gradient of capacity function $C_{i}(\bm{\Phi}({\mathbf{h}},\bm{\theta}^{k}),{\mathbf{h}})$ , but also taking expectation $\mathbb{E}_{\mathbf{h}}[\cdot]$ of this gradient w.r.t. the distribution of ${\mathbf{h}}$ . Either of them may be hard to know in practice, which makes the above algorithm useless. However, so-called policy gradient method used in reinforcement learning provides a good solution for these problems. It can be used to calculate the gradient for functions with the form of $\mathbb{E}_{\mathbf{h}}[f(\bm{\Phi}({\mathbf{h}},\bm{\theta}),{\mathbf{h}})]$ , where $f$ is an unknown function. Actually, it calculates a stochastic and model-free approximation for $\nabla_{\bm{\theta}}\mathbb{E}_{\mathbf{h}}[f(\bm{\Phi}({\mathbf{h}},\bm{\theta}),{\mathbf{h}})]$ [22].

In policy gradient method, the power allocation policy $\bm{\Phi}({\mathbf{h}},\bm{\theta})$ is considered to be drawn from a distribution with a delta density function $\pi_{{\mathbf{h}},\bm{\theta}}({\mathbf{P}})=\delta({\mathbf{P}}-\bm{\Phi}({\mathbf{h}},\bm{\theta}))$ , and then we can rewrite

[TABLE]

in which ${\mathbf{P}}$ is a random realization drawn from the distribution $\pi_{{\mathbf{h}},\bm{\theta}}({\mathbf{P}})$ . However, calculating $\nabla_{\bm{\theta}}\log\pi_{{\mathbf{h}},\bm{\theta}}({\mathbf{P}})$ of a delta density function still requires the knowledge of $f$ . To handle this problem, the delta function can be approximated by Gaussian distribution centered around $\bm{\Phi}({\mathbf{h}},\bm{\theta})$ . And its mean and variance are given by the output features of DNN. Then we can estimate $\nabla_{\bm{\theta}}\mathbb{E}_{\mathbf{h}}[f(\bm{\Phi}({\mathbf{h}},\bm{\theta}),{\mathbf{h}})]$ by using (23) without knowing $f$ . In addition, we take $S$ samples and average them when computing $\mathbb{E}_{{\mathbf{h}},{\mathbf{P}}}[\cdot]$ to reduce the stochastic error:

[TABLE]

where ${\mathbf{h}}_{j}$ is a sampled CSI and ${\mathbf{P}}_{j}=[P_{j,1},...,P_{j,m}]$ is a corresponding realization drawn from the distribution $\pi_{{\mathbf{h}}_{j},\bm{\theta}}({\mathbf{P}})$ . So with (24), we can compute the gradient in step (1) by

[TABLE]

Therefore, the primal variable update of PDDL algorithm can be completed by using (25) without any knowledge of system model $C_{i}({\mathbf{P}},{\mathbf{h}})$ or CSI distribution but only their observations, which makes PDDL model-free. By replacing $\nabla_{\bm{\theta}}\mathcal{L}(\bm{\theta}^{k},\lambda^{k})$ with $\widetilde{\nabla_{\bm{\theta}}}\mathcal{L}(\bm{\theta}^{k},\lambda^{k})$ in (21), PDDL is summarized in the following Algorithm 1.

V Simulation Results

In this section, we perform numerical simulations to exhibit the performance of SDG and PDDL algorithms, and show their validity by comparing with the average equal power allocation policy. Wavelengths in 1520nm-1595nm are used in simulations with 5nm guard band between adjacent wavelengths.

Note that although our PDDL learning algorithm is model-free, we are doing numerical simulations not physical experiments. CSI samples $\{{\mathbf{h}}\}$ and their corresponding channel capacities $\{C_{i}(\bm{\Phi}({\mathbf{h}},\bm{\theta}),{\mathbf{h}})\}$ cannot be observed here. We then still use the system model to compute them, but in reality we can directly get them from the real system in experiments without the need of any theoretical model. In addition, due to the separable use of $P_{i}$ and $h_{i}$ in both objective function and constraints, we construct $m$ independent DNNs for $m$ wavelength channels, and each DNN has three hidden layers with 20, 10 and 5 units respectively. ReLU function is utilized as the non-linear operator $\sigma$ . Furthermore, the truncated Gaussian distribution is used as power policy distribution $\pi_{{\mathbf{h}},\bm{\theta}}({\mathbf{P}})$ in policy gradient method, which constrains generated powers ${\mathbf{P}}$ inside $\mathcal{P}$ to satisfy the eye safety concern. The outputs of DNNs are used as means and standard deviations of $\pi_{{\mathbf{h}},\bm{\theta}}({\mathbf{P}})$ .

Fig. 1 shows the performance of three policies for $8$ wavelength multiplexing. Weights $\bm{\omega}$ are drawn randomly from [math] to $1$ , and other default parameters are set as: $P_{T}=1.2W$ ; $P_{S}=0.3W$ ; $m_{p}=5$ ; $OMI=15\%$ ; $r=0.8$ ; $RIN=-140dB/Hz$ ; $T=300K$ ; transmitter aperture diameter $D_{tx}=0.05m$ ; receiver aperture diameter $D_{rx}=0.1m$ ; $d=1000m$ . Note that these parameter values are taken as an example to show our algorithms’ performance, which can be adjusted based on specific systems and experiments. It can be seen from the left figure that the objective function values achieved by SDG and PDDL learning algorithms converge as iteration increases, and the performance of them outperforms the equal power policy. Similarly, the right figure plots the constraint function values with the increasing of iteration. The values eventually converge to [math] for both of our algorithms, which indicates the feasibility of their optimal solutions. Besides, note that the model-based SDG that solves the problem exactly exhibits the best performance, which matches with our analysis. On the other hand, the objective value achieved by model-free PDDL converges closely to that of SDG, which validates the near perfect performance of PDDL. Moreover, PDDL can be used without any knowledge of system models, which is particularly useful when FSO system models are unknown, inaccurate, or too complicated to deal with, while SDG cannot handle such situations. Additionally, SDG requires to numerically solve a local maximization problem (14) for every $\lambda^{k}$ and ${\mathbf{h}}$ . Though it is not too hard since it is one-dimensional and with no constraint, SDG is still computationally more expensive than PDDL.

In the left figure of Fig. 2, we depict the performance of three policies for $16$ wavelength multiplexing with $P_{T}=2.4W,P_{S}=0.3W$ . Results show both SDG and PDDL learning algorithms perform well for larger WDM systems, and the advantage of PDDL compared to the equal power policy becomes bigger. The right figure plots their performance for $16$ wavelength multiplexing with larger power settings $P_{T}=4W,P_{S}=0.5W$ , which means there is more space for algorithms to manipulate powers. We can see that PDDL performs better in this case and converges roughly the same value as the exact solution found by SDG.

VI Conclusion

This paper investigates the challenging problem of optimal power allocation for WDM transmission in RoFSO systems. Two algorithms are developed to adaptively assign powers to different wavelength channels based on CSI. By showing the null duality gap, we first present the model-based Stochastic Dual Gradient algorithm, which is able to solve the problem exactly but heavily relies on the system model and CSI estimation accuracy. The model-free Primal-Dual Deep Learning algorithm is then developed to overcome the shortcomings of SDG. Specifically, it parameterizes the power allocation policy with Deep Neural Networks and learns optimal parameter values by updating primal and dual variables simultaneously. Policy gradient method is applied to compute updating gradients without using the knowledge of system or channel models. Numerical simulations are performed to show that both of our algorithms outperform the equal power policy. The model-free PDDL learning algorithm presented in this paper has wide applications for problems in FSO networks and communications, where FSO systems are sophisticated to model and turbulent channels are complicated to estimate.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Wake, A. Nkansah, and N. J. Gomes, “Radio over fiber link design for next generation wireless systems,” Journal of Lightwave Technology , vol. 28, no. 16, pp. 2456–2464, 2010.
2[2] W. S. C Chang, “Free-space optical communications,” Journal of Lightwave Technology , vol. 24, no. 12, pp. 4750–4762, 2006.
3[3] L. C. Andrews and R. L. Phillips, Laser beam propagation through random media 2nd ed. , Bellingham : SPIE Press, 2005.
4[4] K. Kazaura, K. Wakamori, M. Matsumoto, T. Higashino, K. Tsukamoto, and S. Komaki, “Rofso: A universal platform for convergence of fiber and free-space optical communication networks,” IEEE Communications Magazine , vol. 48, no. 2, pp. 130–137, 2010.
5[5] H. E. Nistazakis, T. A. Tsiftsis, and G. S. Tombras, “Performance analysis of free-space optical communication systems over atmospheric turbulence channels,” IET Communications , vol. 3, no. 8, pp. 1402–1409, 2009.
6[6] Z. Gao, J. Zhang, and A. Dang, “Beam spread and wander of gaussian beam through anisotropic non-kolmogorov atmospheric turbulence for optical wireless communication,” in IEEE International Conference on Communications (ICC) Workshops , 2017.
7[7] Z. Gao, Z. Li, and A. Dang, “Beam wander effects on scintillation theory of gaussian beam through anisotropic non-kolmogorov atmospheric turbulence for optical wireless communication,” in IEEE International Conference on Communications (ICC) Workshops , 2018.
8[8] A. Bekkali, P. T. Dat, K. Kazaura, K. Wakamori, M. Matsumoto, T. Higashino, K. Tsukamoto, and S. Komaki, “Performance evaluation of an advanced dwdm rofso system for transmitting multiple rf signals,” IEICE Transactions on Fundamentals of Electronics , vol. E 92.A, no. 11, pp. 2697–2705, 2009.