Robust Secrecy via Aerial Reflection and Jamming: Joint Optimization of   Deployment and Transmission

Xiao Tang; Hongliang He; Limeng Dong; Lixin Li; Qinghe Du; Zhu Han

arXiv:2302.14764·eess.SP·March 1, 2023

Robust Secrecy via Aerial Reflection and Jamming: Joint Optimization of Deployment and Transmission

Xiao Tang, Hongliang He, Limeng Dong, Lixin Li, Qinghe Du, Zhu Han

PDF

Open Access

TL;DR

This paper proposes a robust method to enhance wireless security using aerial RIS and jamming, optimizing deployment and transmission to maximize secrecy under imperfect channel information.

Contribution

It introduces a joint optimization framework for aerial RIS deployment and transmission strategies to improve worst-case secrecy in wireless networks.

Findings

01

Aerial RIS and jamming significantly improve security performance.

02

Optimized deployment enhances worst-case secrecy rates.

03

The proposed method outperforms baseline approaches in simulations.

Abstract

Reconfigurable intelligent surfaces (RISs) are recognized with great potential to strengthen wireless security, yet the performance gain largely depends on the deployment location of RISs in the network topology. In this paper, we consider the anti-eavesdropping communication established through a RIS at a fixed location, as well as an aerial platform mounting another RIS and a friendly jammer to further improve the secrecy. The aerial RIS helps enhance the legitimate signal and the aerial cooperative jamming is strengthened through the fixed RIS. The security gain with aerial reflection and jamming is further improved with the optimized deployment of the aerial platform. We particularly consider the imperfect channel state information issue and address the worst-case secrecy for robust performance. The formulated robust secrecy rate maximization problem is decomposed into two layers,…

Equations186

y_{D} =

y_{D} =

+ (h_{J D}^{†} + h_{R D}^{†} Θ_{R} H_{J R}) z + n_{D},

y_{k} =

y_{k} =

+ (h_{J k}^{†} + h_{R k}^{†} Θ_{R} H_{J R}) z + n_{k},

h_{A D}^{†} Θ_{A} h_{S A} = ϑ_{A}^{†} = Δ h_{S A D} diag (h_{A D}^{†}) h_{S A} = ϑ_{A}^{†} h_{S A D},

h_{A D}^{†} Θ_{A} h_{S A} = ϑ_{A}^{†} = Δ h_{S A D} diag (h_{A D}^{†}) h_{S A} = ϑ_{A}^{†} h_{S A D},

h_{R D}^{†} Θ_{R} h_{S R} = ϑ_{R}^{†} = Δ h_{S R D} diag (h_{R D}^{†}) h_{S R} = ϑ_{R}^{†} h_{S R D},

h_{R D}^{†} Θ_{R} H_{J R} = ϑ_{R}^{†} = Δ h_{J R D} diag (h_{R D}^{†}) H_{J R} = ϑ_{R}^{†} h_{J R D},

h_{A k}^{†} Θ_{A} h_{S k} = ϑ_{A}^{†} = Δ h_{S A k} diag (h_{A k}^{†}) h_{S A} = ϑ_{A}^{†} h_{S A k},

h_{A k}^{†} Θ_{A} h_{S k} = ϑ_{A}^{†} = Δ h_{S A k} diag (h_{A k}^{†}) h_{S A} = ϑ_{A}^{†} h_{S A k},

h_{R k}^{†} Θ_{R} h_{S R} = ϑ_{R}^{†} = Δ h_{S R k} diag (h_{R k}^{†}) h_{S R} = ϑ_{R}^{†} h_{S R k},

h_{R k}^{†} Θ_{R} H_{J R} = ϑ_{R}^{†} = Δ H_{J R k} diag (h_{R k}^{†}) H_{J R} = ϑ_{R}^{†} H_{J R k} .

γ_{D} = \frac{P _{S} ϑ _{A}^{†} h _{S A D} + ϑ _{R}^{†} h _{S R D} ^{2}}{( h _{J D}^{†} + ϑ _{R}^{†} h _{J R D} ) Z ( h _{J D}^{†} + ϑ _{R}^{†} h _{J R D} ) ^{†} + σ _{0}^{2}},

γ_{D} = \frac{P _{S} ϑ _{A}^{†} h _{S A D} + ϑ _{R}^{†} h _{S R D} ^{2}}{( h _{J D}^{†} + ϑ _{R}^{†} h _{J R D} ) Z ( h _{J D}^{†} + ϑ _{R}^{†} h _{J R D} ) ^{†} + σ _{0}^{2}},

γ_{k} = \frac{P _{S} ϑ _{A}^{†} h _{S A k} + ϑ _{R}^{†} h _{S R k} ^{2}}{( h _{J k}^{†} + ϑ _{R}^{†} H _{J R k} ) Z ( h _{J k}^{†} + ϑ _{R}^{†} H _{J R k} ) ^{†} + σ _{0}^{2}},

γ_{k} = \frac{P _{S} ϑ _{A}^{†} h _{S A k} + ϑ _{R}^{†} h _{S R k} ^{2}}{( h _{J k}^{†} + ϑ _{R}^{†} H _{J R k} ) Z ( h _{J k}^{†} + ϑ _{R}^{†} H _{J R k} ) ^{†} + σ _{0}^{2}},

R_{S} = [lo g (1 + γ_{D}) - k \in K max lo g (1 + γ_{k})]^{+},

R_{S} = [lo g (1 + γ_{D}) - k \in K max lo g (1 + γ_{k})]^{+},

h_{S A k} = \hat{h}_{S A k} + Δ h_{S A k}, with ∥ Δ h_{S A k} ∥^{2} \leq ϵ_{S A k},

h_{S A k} = \hat{h}_{S A k} + Δ h_{S A k}, with ∥ Δ h_{S A k} ∥^{2} \leq ϵ_{S A k},

h_{S R k} = \hat{h}_{S R k} + Δ h_{S R k}, with ∥ Δ h_{S R k} ∥^{2} \leq ϵ_{S R k},

h_{J k} = \hat{h}_{J k} + Δ h_{J k}, with ∥ Δ h_{J k} ∥^{2} \leq ϵ_{J k},

H_{J R k} = \hat{H}_{J R k} + Δ H_{J R k}, with ∥ Δ H_{J R k} ∥_{F}^{2} \leq ϵ_{J R k},

[w_{A}^{(x)}, w_{A}^{(y)}], ϑ_{A}, ϑ_{R}, Z max

[w_{A}^{(x)}, w_{A}^{(y)}], ϑ_{A}, ϑ_{R}, Z max

s.t.

∣ ϑ_{A, n} ∣ = 1, \forall n \in N_{A},

∣ ϑ_{R, n} ∣ = 1, \forall n \in N_{R},

Tr (Z) \geq P_{J}, Z ≽ 0.

ϑ_{A}, ϑ_{R}, Z max

ϑ_{A}, ϑ_{R}, Z max

s.t.

∣ ϑ_{R, n} ∣ = 1, \forall n \in N_{R},

Tr (Z) \geq P_{J}, Z ≽ 0.

ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k}^{2} \leq ψ_{S k}, \forall k \in K,

ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k}^{2} \leq ψ_{S k}, \forall k \in K,

(h_{J k}^{†} + ϑ_{R}^{†} H_{J R k}) Z (h_{J k}^{†} + ϑ_{R}^{†} H_{J R k})^{†} \geq ψ_{J k}, \forall k \in K,

(h_{J k}^{†} + ϑ_{R}^{†} H_{J R k}) Z (h_{J k}^{†} + ϑ_{R}^{†} H_{J R k})^{†} \geq ψ_{J k}, \forall k \in K,

γ_{k} \leq \frac{P _{S} ψ _{S k}}{ψ _{J k} + σ _{0}^{2}}, \forall k \in K .

γ_{k} \leq \frac{P _{S} ψ _{S k}}{ψ _{J k} + σ _{0}^{2}}, \forall k \in K .

ψ_{S k} ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k} (ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k})^{†} 1 ≽ 0.

ψ_{S k} ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k} (ϑ_{A}^{†} h_{S A k} + ϑ_{R}^{†} h_{S R k})^{†} 1 ≽ 0.

≽ = ψ_{S k} ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k} (ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k})^{†} 1 0 ϑ_{A}^{†} Δ h_{S A k} (ϑ_{A}^{†} Δ h_{S A k})^{†} 0 + 0 ϑ_{R}^{†} Δ h_{S R k} (ϑ_{R}^{†} Δ h_{S R k})^{†} 0 - [10] Δ h_{S A k}^{†} [0_{N_{A} \times 1} ϑ_{A}] - [0_{1 \times N_{A}} ϑ_{A}^{†}] Δ h_{S A k} [10] - [10] Δ h_{S R k}^{†} [0_{N_{R} \times 1} ϑ_{R}] - [0_{1 \times N_{R}} ϑ_{R}^{†}] Δ h_{S R k} [10],

≽ = ψ_{S k} ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k} (ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k})^{†} 1 0 ϑ_{A}^{†} Δ h_{S A k} (ϑ_{A}^{†} Δ h_{S A k})^{†} 0 + 0 ϑ_{R}^{†} Δ h_{S R k} (ϑ_{R}^{†} Δ h_{S R k})^{†} 0 - [10] Δ h_{S A k}^{†} [0_{N_{A} \times 1} ϑ_{A}] - [0_{1 \times N_{A}} ϑ_{A}^{†}] Δ h_{S A k} [10] - [10] Δ h_{S R k}^{†} [0_{N_{R} \times 1} ϑ_{R}] - [0_{1 \times N_{R}} ϑ_{R}^{†}] Δ h_{S R k} [10],

ψ_{S k} - ρ_{1, k} - ρ_{2, k} ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k} 0_{N_{A} \times 1} 0_{N_{R} \times 1} (ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k})^{†} 1 ϵ_{S A k} ϑ_{A} ϵ_{S R k} ϑ_{R} 0_{1 \times N_{A}} ϵ_{S A k} ϑ_{A}^{†} ρ_{1, k} I_{N_{A} \times N_{A}} 0_{N_{R} \times N_{R}} 0_{1 \times N_{R}} ϵ_{S R k} ϑ_{R}^{†} 0_{N_{A} \times N_{R}} ρ_{2, k} I_{N_{R} \times N_{R}} ≽ 0,

ψ_{S k} - ρ_{1, k} - ρ_{2, k} ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k} 0_{N_{A} \times 1} 0_{N_{R} \times 1} (ϑ_{A}^{†} \hat{h}_{S A k} + ϑ_{R}^{†} \hat{h}_{S R k})^{†} 1 ϵ_{S A k} ϑ_{A} ϵ_{S R k} ϑ_{R} 0_{1 \times N_{A}} ϵ_{S A k} ϑ_{A}^{†} ρ_{1, k} I_{N_{A} \times N_{A}} 0_{N_{R} \times N_{R}} 0_{1 \times N_{R}} ϵ_{S R k} ϑ_{R}^{†} 0_{N_{A} \times N_{R}} ρ_{2, k} I_{N_{R} \times N_{R}} ≽ 0,

\bm{h}_{Jk}^{{\dagger}}+\bm{\vartheta}_{R}^{{\dagger}}\bm{H}_{JRk}=\underbrace{\left[1\>\>\bm{\vartheta}_{R}^{{\dagger}}\right]}_{\overset{\Delta}{=}\tilde{\bm{\vartheta}}_{R}^{{\dagger}}}\underbrace{\left[\begin{array}[]{c}\bm{h}_{Jk}^{{\dagger}}\\ \bm{H}_{JRk}\end{array}\right]}_{\overset{\Delta}{=}\tilde{\bm{H}}_{k}^{{\dagger}}},

\bm{h}_{Jk}^{{\dagger}}+\bm{\vartheta}_{R}^{{\dagger}}\bm{H}_{JRk}=\underbrace{\left[1\>\>\bm{\vartheta}_{R}^{{\dagger}}\right]}_{\overset{\Delta}{=}\tilde{\bm{\vartheta}}_{R}^{{\dagger}}}\underbrace{\left[\begin{array}[]{c}\bm{h}_{Jk}^{{\dagger}}\\ \bm{H}_{JRk}\end{array}\right]}_{\overset{\Delta}{=}\tilde{\bm{H}}_{k}^{{\dagger}}},

vec^{†} (\tilde{H}_{k}) ((\tilde{ϑ}_{R} \tilde{ϑ}_{R}^{†})^{T} \otimes Z) vec (\tilde{H}_{k}) - ψ_{J k} \geq 0.

vec^{†} (\tilde{H}_{k}) ((\tilde{ϑ}_{R} \tilde{ϑ}_{R}^{†})^{T} \otimes Z) vec (\tilde{H}_{k}) - ψ_{J k} \geq 0.

\tilde{H}_{k} = \hat{\tilde{H}}_{k} + Δ \tilde{H}_{k},

\tilde{H}_{k} = \hat{\tilde{H}}_{k} + Δ \tilde{H}_{k},

vec^{†} (Δ \tilde{H}_{k}) Ω vec (Δ \tilde{H}_{k}) + vec^{†} (Δ \tilde{H}_{k}) Ω vec (\hat{\tilde{H}}_{k})

vec^{†} (Δ \tilde{H}_{k}) Ω vec (Δ \tilde{H}_{k}) + vec^{†} (Δ \tilde{H}_{k}) Ω vec (\hat{\tilde{H}}_{k})

+ vec^{†} (\hat{\tilde{H}}_{k}) Ω vec (Δ \tilde{H}_{k}) + vec^{†} (\hat{\tilde{H}}_{k}) Ω vec (\hat{\tilde{H}}_{k})

- ψ_{J k} \geq 0,

vec^{†} (Δ h_{J k}) vec (Δ h_{J k}) \leq ϵ_{J k},

vec^{†} (Δ h_{J k}) vec (Δ h_{J k}) \leq ϵ_{J k},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Wireless Communication Technologies · Wireless Communication Security Techniques · UAV Applications and Optimization

Full text

Robust Secrecy via Aerial Reflection and Jamming: Joint Optimization of Deployment and Transmission

Xiao Tang, Hongliang He, Limeng Dong, Lixin Li, Qinghe Du, and Zhu Han X. Tang, L. Dong, and L. Li are with the School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China. (Email: [email protected])H. He is with the School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan 430074, China.Q. Du is with the Department of Communication Engineering, Xi’an Jiaotong University, Xi’an 710049, China.Z. Han is with the Department of Electrical and Computer Engineering, University of Houston, Houston TX, 77004, USA.

Abstract

Reconfigurable intelligent surfaces (RISs) are recognized with great potential to strengthen wireless security, yet the performance gain largely depends on the deployment location of RISs in the network topology. In this paper, we consider the anti-eavesdropping communication established through a RIS at a fixed location, as well as an aerial platform mounting another RIS and a friendly jammer to further improve the secrecy. The aerial RIS helps enhance the legitimate signal and the aerial cooperative jamming is strengthened through the fixed RIS. The security gain with aerial reflection and jamming is further improved with the optimized deployment of the aerial platform. We particularly consider the imperfect channel state information issue and address the worst-case secrecy for robust performance. The formulated robust secrecy rate maximization problem is decomposed into two layers, where the inner layer solves for reflection and jamming with robust optimization, and the outer layer tackles the aerial deployment through deep reinforcement learning. Simulation results show the deployment under different network topologies and demonstrate the performance superiority of our proposal in terms of the worst-case security provisioning as compared with the baselines.

Index Terms:

Physical layer security, reconfigurable intelligent surface, aerial deployment, deep reinforcement learning

I Introduction

Reconfigurable intelligent surface (RIS) is envisioned as a paradigm-shifting technology to empower the next-generation wireless communications. With a massive number of low-cost and low-power reflecting elements to alter the electromagnetic properties of the incident signal, RIS enables controllable and programmable wireless propagation rather than only adapting to the environments as conventional communications [1]. In this regard, RIS-assisted communication features to intentionally add up the received signals constructively or destructively to improve the desired receptions while weakening the unintended ones. Due to the cost-effective and flexible operations of RISs, there has emerged rich literature investigating various aspects of RIS-enabled communications, e.g., RIS channel modeling, RIS-assisted transmissions, RIS-enhanced information security, etc. [2].

Security is one of the primary concerns for wireless communications, for which physical layer security featuring keyless operations while providing information-theoretical secrecy has arisen as an attracting solution [3]. Physical layer security technique defends against malicious adversaries by exploiting the randomness of wireless medium, in this respect, the ability of RISs to actively intervene the wireless environment has provided an additional degree of freedom to further enhance the information security [4]. With RIS-enabled intelligent radio, we can intentionally improve the legitimate reception while reducing the signal leakages to unintended third parties and thus improves the security performance significantly. Therefore, RIS-enhanced security has attracted wide attention recently, including RIS-assisted friendly relaying, jamming mitigation, artificial noise design, etc. [5, 6].

Despite the potential of RISs to enhance wireless communications, the performance gain is heavily affected by the RIS-channel quality. In particular, the RIS-related channel fundamentally depends on the product of the incident signal channel, reflection channel, and the phase shifts, and thus the deployment of RISs has a great impact on the overall performance [7]. For example, although it is more likely to establish line-of-sight links via RISs deployed on high-rise buildings, the overall transmission distance over RISs is usually rather longer as compared with the direct links, and thus may not be able to provide desired performance enhancement. Towards this issue, a direct complement is to increase the number of reflecting elements or deploy more RISs, yet this can be severely restricted by the physical conditions [8, 9]. Also, one may resort to new-type RISs with more desired properties, such as active RISs with amplified reflection, whereas it essentially relies on the advance of material or circuit design that largely goes beyond the scope of conventional communications [10].

Recently, rapid progress has been witnessed in various aerial platforms, such as unmanned aerial vehicles (UAVs), high altitude platforms (HAPs), and airships [11, 12]. We can then exploit the aerial platforms as the base for RISs, leading to the aerial RISs (ARISs) that enable flexibly deployed RISs to achieve the optimized reflection in various network topology [13, 14]. Regarding RIS-assisted physical layer security, ARISs can significantly strengthen the legitimate signal and downgrade the eavesdropping. Particularly, through reflection at the optimal location, we can effectively weaken the signal leakage or enhance the artificial jamming at unintended receivers thus improving the secrecy rate [15, 16]. Therefore, the ARISs with flexible and on-demand deployment have great advantages as compared with fixed RISs, and have tremendous potential to catalyze conventional security approaches toward more adequately-protected information security.

Attracted by the RIS-benefited wireless security, we in this paper propose to deploy a fixed RIS as well as an ARIS for anti-eavesdropping communications. The aerial platform carrying the ARIS is also associated with a friendly jammer, enhancing the secrecy through cooperative reflection and jamming. We particularly consider the cases that the channels related to the eavesdroppers are associated with uncertainties. By exploiting robust optimization and learning techniques, we propose a joint design of aerial deployment, reflection at the RISs, and jamming, to maximize the robust secrecy. To summarize, the main contributions are highlighted as follows:

•

We propose to deploy an aerial platform carrying an ARIS and a cooperative jammer, along with a fixed RIS, to enhance wireless secrecy. The aerial reflection and jamming are enabled with flexible deployment so as to coordinate with the fixed RIS, improving the legitimate transmissions while downgrading the eavesdropping.

•

We employ the cascaded reflection channel model and consider the imperfect channel state information at the eavesdroppers. Targeting at the worst case for robustness, we formulate the problem to maximize the robust secrecy by jointly considering the aerial deployment, reflection at the RISs, and jamming optimization.

•

We decompose the problem into two layers, where the inner layer optimizes the secure transmission and the outer layer for deployment. In the inner layer, by deriving the worst-case secrecy rate, the reflection and jamming strategies are obtained within a block coordinate descent (BCD) framework. Then, the outer-layer deployment is obtained with deep reinforcement learning technique.

The rest of this paper is organized as follows. In Sec. II, we review the related work. In Sec. III, we present the system model with the robust secrecy optimization problem formulation. In Sec. IV, the inner transmission problem is solved to optimize reflection and jamming. In Sec. V, the deployment of the aerial platform is tackled with deep reinforcement learning. Sec. VI provides the simulation results, and finally Sec. VII concludes this paper.

II Related Work

Due to the ability to actively intervene the signal propagation, RISs have the potential to enhance wireless communications in various aspects [1, 2]. Particularly, the interplay between RISs and physical layer security has shown significant advantages to safeguarding secrecy while defending against eavesdropping [4]. In [6], the authors consider the artificial noise-aided secure communications, where a RIS is invoked to enhance the secrecy rate by jointly optimizing the reflection and jamming. In [17], the authors consider the non-orthogonal multiple access scenario with a RIS to assist the secure transmissions. In [18], the authors investigate the secure edge computing issue, where the reflection-enhanced transmission is jointly optimized with the computing strategy to secure the computation offloading. In [19], the authors exploit a RIS as a backscatter device that modulates the received confidential signal to jamming signal to deteriorate the eavesdropping. In [20], the authors consider both eavesdropping and jamming attacks, where a RIS with reflection optimization is exploited to enhance security. In [21], the authors propose a RIS-assisted key generation scheme by intervening in the propagation in harsh environments. In [22], the authors exploit the non-cooperative game to model the interaction between a RIS-assisted legitimate user and a smart attacker with learning-based security solutions.

Recently, the deployment of RISs has raised increasing interest with joint consideration of reflection-based transmissions. In [14], the authors jointly consider the RIS deployment and access strategies for maximized system rate. In [23], the authors investigate the full-duplex system while jointly optimizing the passive beamforming and deployment of the RIS. In this aspect, the various aerial platforms, particularly UAVs, have enabled ARISs as a more flexible solution [15]. In [24], the authors exploit an ARIS to maximize the worst-case signal-to-noise ratio by jointly considering the transmission and AIRS placement. In [25], the authors propose to deploy multiple ARISs forming a massive multiple-input multiple-output network to extend the network coverage. Moreover, ARIS-assisted security has also emerged as an attractive solution to defend against various attacks. In [16], the authors investigate various use cases to integrate UAVs and RISs to enhance physical layer security. In [26], the authors propose to use a UAV-carried RIS to defend against eavesdropping with trajectory optimization. In [27], the authors address the anti-jamming communications by leveraging the ARIS reflection and deployment. In above work, the RISs are either fixed or deployed with UAVs, where the former potentially lacks flexibility while the latter is of high cost. As such, we may resort to the on-demand and adaptive use of both types to tackle the unfavorable transmission scenario with reasonable expenditure.

As the reflection-based transmission through RISs raises higher challenges for channel measurement, many research efforts have been devoted to the imperfect channel information issue. In [28], the authors adopt the cascaded reflection channel model with imperfection and minimize the energy consumption. In [29], the authors propose a robust design regarding the instantaneous beamforming and quasi-static phase shifts with channel imperfection, adapting to the rapid channel variation. In [30], the authors consider the randomly distributed channel errors and investigate the wireless information and power transfer issue under probabilistically robust constraints. In [31], the authors address the secure full-duplex communications with channel uncertainties and optimize the worst-case achievable secrecy rate. In [32], the authors achieve robust secrecy for RIS-aided UAV communications, with a joint design of transmission and UAV trajectory. In [33], the authors consider the secure multicast beamforming with RISs, the distributionally robustness is achieved against different distributions of channel errors. While these researches address different uncertainty models, the considered cases only incorporate one fixed RIS. In this regard, it is worth investigating the uncertainty issue in some more complicated scenarios incorporating multiple RISs with different types, providing insights for the more generalized use cases of RIS-assisted wireless security.

III System Model

We consider an area, denoted by $\mathcal{A}$ , with a legitimate source node, denoted by $S$ , having confidential information towards a legitimate destination node, denoted by $D$ . While in the same area, there are a set of eavesdroppers, denoted by $E_{k}$ with $k\in\mathcal{K}=\left\{1,2,\cdots,K\right\}$ , intending to wiretap the legitimate transmissions, as shown in Fig. 1. The legitimate source and destination, as well as the eavesdroppers, are assumed of one single antenna and are located on the ground. We consider the scenario that the legitimate source is blocked by certain high-rise obstacles, and thus there is no direct link towards the legitimate destination or eavesdroppers. Meanwhile, a RIS is deployed at a certain fixed location, noted as fixed RIS and denoted by $R$ , to establish reflection links to assist the transmission, where the reflected signals can also be overheard by the eavesdroppers. Suppose there are $N_{R}$ reflection elements at $R$ , denoted by $\mathcal{N}_{R}=\left\{1,2,\cdots,N_{R}\right\}$ , the phase shifts are given as $\bm{\theta}_{R}=\left[\theta_{R,n}\right]_{n\in\mathcal{N}_{R}}$ . Then the reflection-coefficient matrix is given as $\bm{\Theta}_{R}=\mathsf{diag}\left(\bm{\vartheta}_{R}\right)$ with $\bm{\vartheta}_{R}=\left[\vartheta_{R,n}\right]_{n\in\mathcal{N}_{R}}$ and $\vartheta_{R,n}=e^{j\theta_{R,n}}$ . Also, the channels from $S$ to $R$ , from $R$ to $D$ , and from $R$ to $E_{k}$ are denoted by $\bm{h}_{SR}$ , $\bm{h}_{RD}$ , and $\bm{h}_{Rk}$ , respectively. Here, we assume the constant amplitude response to facilitate the analysis. A more practical and general model is proposed recently with phase-dependent amplitude [34]. Our considered scenario can also incorporate such a reflection model, and as can be safely expected, our proposed framework can be extended to such cases with proper treatment of the amplitude issue.

Despite the transmissions established through the fixed RIS, it may still be challenging to guarantee the secrecy of legitimate communications. To this issue, we propose to deploy an aerial platform mounting a RIS and a jammer to enhance wireless secrecy. A typical scenario is that the fixed RIS is deployed at some high-rise buildings while not always as effective given the mobility or potentially unfavorable locations of the legitimate nodes. In this regard, an ARIS carried by a UAV is dispatched to assist the secure communications. The ARIS, denoted by $A$ , has geographic coordinates of $\bm{w}_{A}=\left[w_{A}^{\text{(x)}},w_{A}^{\text{(y)}},H_{A}\right]$ with $H_{A}$ being the fixed altitude, and enables communications through reflection in the air. Suppose there are $N_{A}$ reflection elements at $A$ , denoted by $\mathcal{N}_{A}=\left\{1,2,\cdots,N_{A}\right\}$ with corresponding phase shifts given as $\bm{\theta}_{A}=\left[\theta_{A,n}\right]_{n\in\mathcal{N}_{A}}$ . Then the reflection-coefficient matrix is given as $\bm{\Theta}_{A}=\mathsf{diag}\left(\bm{\vartheta}_{A}\right)$ with $\bm{\vartheta}_{A}=\left[\vartheta_{A,n}\right]_{n\in\mathcal{N}_{A}}$ and $\vartheta_{A,n}=e^{j\theta_{A,n}}$ . Similarly as the fixed RIS, the channels from $S$ to $A$ , from $A$ to $D$ , and from $A$ to $E_{k}$ are denoted by $\bm{h}_{SA}$ , $\bm{h}_{AD}$ , and $\bm{h}_{Ak}$ , respectively. We in this paper only consider the single-time reflection at the RISs, while the signals reflected multiple times or between the RISs are ignored due to the more severe attenuation therein. Moreover, a cooperative jammer, denoted by $J$ with $M$ antennas is also mounted onto the aerial platform as a flying helper emitting artificial noise to intentionally deteriorate the eavesdropping. Since the jammer belongs to the legitimate system, we assume that the legitimate signal and jamming signals are well coordinated. As the jammer is in the air, it has direct links to the legitimate destination and eavesdroppers. Then, the channel from $J$ to $D$ and from $J$ to $E_{k}$ are denoted by $\bm{h}_{JD}$ and $\bm{h}_{Jk}$ , respectively. Also, the jamming signal reaches the ground through reflection. Given the physical space limitation in the aerial platform to carry ARIS and jamming device, e.g., jammer on top of the platform while ARIS beneath, we consider that the jamming signal is only reflected through the fixed RIS, and is not affected by the ARIS. The channel from $J$ to $R$ is denoted by $\bm{H}_{JR}$ , and the reflection link from $R$ to the ground nodes are given as $\bm{h}_{JD}$ and $\bm{h}_{Jk}$ .

For the considered system, the signals from the legitimate transmitter and jammer are denoted by $x$ and $\bm{z}$ , respectively, where $x\sim\mathcal{CN}\left(0,P_{S}\right)$ with $P_{S}$ being the transmit power of $S$ , $\bm{z}\sim\mathcal{CN}\left(\bm{0}_{M\times 1},\bm{Z}\right)$ with $\bm{Z}$ being the covariance of the jamming signal subject to the maximum jamming power specified by $P_{J}$ . Then the received signals at the legitimate destination $D$ , and eavesdropper $E_{k}$ , are

[TABLE]

and

[TABLE]

respectively, where $n_{D}$ and $n_{k}$ are the Gaussian noise at $D$ and $E_{k}$ , respectively. In this paper, we adopt the cascaded channel model and rewrite the links established through reflection as

[TABLE]

at legitimate receiver $D$ . Similarly, at eavesdropper $E_{k}$ , we have

[TABLE]

Based on the transmission model, the signal-to-interference-plus-noise ratio (SINR) at legitimate receiver $D$ , and eavesdropper $E_{k}$ , are

[TABLE]

and

[TABLE]

respectively, where $\sigma_{0}^{2}$ is the background noise power assumed identical at all receivers. Then, the secrecy rate is obtained as

[TABLE]

where $(\>\cdot\>)^{+}=\max\{\>\cdot\>,0\}$ , and this operator is omitted for the discussions afterwards since the transmission will be ceased for negative secrecy rate.

Moreover, we consider that perfect channel state information can be obtained between the legitimate transmission pair, while the channel state information regarding the eavesdroppers is associated with errors. This is due to the fact that it is difficult to obtain precise channel information at the passive eavesdroppers, especially when we consider multiple eavesdroppers with reflection links. Correspondingly, the channels related to $E_{k}$ , $\forall k\in\mathcal{K}$ are modeled as

[TABLE]

where $\hat{\bm{h}}_{SAk}$ , $\hat{\bm{h}}_{SRk}$ , $\hat{\bm{h}}_{Jk}$ , and $\hat{\bm{H}}_{JRk}$ are estimation of the cascaded channels, $\Delta{\bm{h}}_{SAk}$ , $\Delta{\bm{h}}_{SRk}$ , $\Delta{\bm{h}}_{Jk}$ , and $\Delta{\bm{H}}_{JRk}$ are the errors with $\epsilon_{SAk}$ , $\epsilon_{SRk}$ , $\epsilon_{Jk}$ , and $\epsilon_{JRk}$ being the error bounds.

Based on the discussions above, we intend to maximize the secrecy rate of legitimate transmissions by jointly optimizing the artificial noise, reflection, and UAV deployment, in the presence of channel uncertainties at the eavesdroppers. Correspondingly, the secrecy optimization problem is formulated as

[TABLE]

The formulated problem is rather complicated with three-fold difficulties. First, the reflection optimization needs to consider the legitimate signal and artificial noise simultaneously, whose reflections are in an asymmetry manner as the legitimate signal is reflected by both RISs while the jamming signal is only reflected by the fixed RIS. Second, the mobility of the aerial platform affects the aerial reflection as well as friendly jamming, and further interplays with the reflection at the fixed RIS. Third, the channels at eavesdroppers are associated with uncertainties, which influence the reflection and jamming and need to be tackled to achieve robust secrecy.

In order to solve the problem effectively, we propose to decompose it into two layers, the inner layer optimizes the friendly jamming and reflection, while the outer layer tackles the aerial deployment. The decomposition is based on the fact that the inner-layer problem solving is conducted based on given channel conditions (though with uncertainties), while the outer-layer deployment affects the channel and further impacts the transmissions. Specifically, the deployment affects the system topology and thus the changes in large-scale channel conditions dominate the influence on system performance. Meanwhile, when the deployment is given, indicating fixed large-scale fading in the system, the transmission issue mainly addresses the small-scale fading along with the information uncertainties. Further, the inner and outer subproblems are solved through robust optimization and learning techniques, respectively, as elaborated in the following sections.

IV Jamming and Reflection Optimization

In this section, we consider the inner problem to optimize the jamming and reflection with fixed UAV deployment, in the presence of channel uncertainties, specified as

[TABLE]

In (10a), we can see that the reflection and jamming are complicatedly coupled with each other while jointly affected by the uncertainties. In this regard, we first tackle the uncertainties within the minimization operation in the form of explicit constraints. Then, we investigate the cooperative jamming, ARIS reflection, and fixed-RIS reflection separately according to the physical functionalities. We adopt the block coordinate descent (BDC) framework to solve the subproblems independently with different optimization techniques as detailed below.

IV-A Reformulation Against Uncertainties

For the considered problem of secrecy enhancement, the uncertainties are associated with the channels at the eavesdroppers and further affect the jamming and reflection strategies. Mathematically, the uncertainties are incorporated in the SINRs at the eavesdropper as part of the objective function in the formulated problem, which is tackled first to facilitate the analyses. In this respect, by introducing new variables $\left\{\psi_{Sk}\right\}_{k\in\mathcal{K}}$ and $\left\{\psi_{Jk}\right\}_{k\in\mathcal{K}}$ with

[TABLE]

and

[TABLE]

we reach the inequalities regarding SINRs at the eavesdroppers as

[TABLE]

Specifically, the inequality in (11) can be reinterpreted through Schur complement as

[TABLE]

Then, by substituting the channels with uncertainties in (8a) and (8b) into the Schur complement condition in (14), we have the following inequality along with a further derivation as

[TABLE]

which separates the channel estimations and uncertainties on the left-hand side and right-hand side, respectively. The inequality in (15) facilitates the application of general sign-definiteness principle [35], leading to the following equivalent inequality as (16),

where $\left\{\rho_{1,k}\right\}_{k\in\mathcal{K}}$ and $\left\{\rho_{2,k}\right\}_{k\in\mathcal{K}}$ are the non-negative variables newly introduced along with the general sign-definiteness principle. In (16), the uncertainty parts are replaced with the corresponding error bound, indicating that the inequality in (16) acts as the robust counterpart for the inequality in (11). Moreover, the inequality in (16) incorporates reflection coefficients and introduced auxiliaries in the form of linear matrix inequalities, which are convex and can be conveniently tackled in existing solvers.

To deal with the constraints in (12), we first introduce the following reformulation

[TABLE]

leading to the equivalence to the inequality in (12) as $\tilde{\bm{\vartheta}}_{R}^{{\dagger}}\tilde{\bm{H}}_{k}^{{\dagger}}\bm{Z}\tilde{\bm{H}}_{k}\tilde{\bm{\vartheta}}_{R}\geq\psi_{Jk}$ . Then, by introducing the trace operation and exploiting the properties of trace, we have $\mathsf{Tr}\left(\tilde{\bm{H}}_{k}^{{\dagger}}\bm{Z}\tilde{\bm{H}}_{k}\tilde{\bm{\vartheta}}_{R}\tilde{\bm{\vartheta}}_{R}^{{\dagger}}\right)-\psi_{Jk}\geq 0$ . Further, by invoking the equality $\mathsf{Tr}\left(\bm{A}^{{\dagger}}\bm{B}\bm{C}\bm{D}\right)=\mathsf{vec}^{{\dagger}}\left(\bm{A}\right)\left(\bm{D}^{T}\otimes\bm{B}\right)\mathsf{vec}\left(\bm{C}\right)$ , we arrive at

[TABLE]

Recall the channel uncertainties in (8c) and (8d) with the definition in (17), we have that

[TABLE]

where $\hat{\tilde{\bm{H}}}_{k}=\left[\hat{\bm{h}}_{Jk}\>\>\hat{\bm{H}}_{JRk}^{{\dagger}}\right]$ and $\Delta\tilde{\bm{H}}_{k}=\left[\Delta\bm{h}_{Jk}\>\>\Delta\bm{H}_{JRk}^{{\dagger}}\right]$ . Then, the inequality in (18) is extended as

[TABLE]

where $\bm{\Omega}\overset{\Delta}{=}\left(\left(\tilde{\bm{\vartheta}}_{R}\tilde{\bm{\vartheta}}_{R}^{{\dagger}}\right)^{T}\otimes\bm{Z}\right)$ is defined for notation simplicity. Meanwhile, for the uncertainty bounds in (8c), it can be rewritten as $\Delta\bm{h}_{Jk}^{{\dagger}}\Delta\bm{h}_{Jk}\leq\epsilon_{Jk}$ , in the equivalent form as

[TABLE]

by applying the matrix equality $\mathsf{Tr}\left(\bm{A}^{{\dagger}}\bm{B}\right)=\mathsf{vec}^{{\dagger}}\left(\bm{A}\right)\mathsf{vec}\left(\bm{B}\right)$ . Comparing the structures of $\Delta\bm{h}_{Jk}$ with $\Delta\tilde{\bm{H}}_{k}$ , we can rewrite the error bound for $\Delta\bm{h}_{Jk}$ with respect to $\Delta\tilde{\bm{H}}_{k}$ as

[TABLE]

where $\bm{\Upsilon}_{Jk}$ is introduced for notation simplicity. Similarly, for the error bounds in (8d), we can derive the equivalence as

[TABLE]

Then, the inequality above can be rewritten in terms of $\Delta\tilde{\bm{H}}_{k}$ as

[TABLE]

by comparing the elements in $\Delta\bm{H}_{JRk}$ and $\Delta\tilde{\bm{H}}_{k}$ , with $\bm{\Upsilon}_{JRk}$ similarly introduced. For the inequalities in (20), (22), and (24) with quadratic forms on the left-hand side, we can adopt general S-procedure [36] to derive the following inequality in (25),

where $\left\{\eta_{1,k}\right\}_{k\in\mathcal{K}}$ and $\left\{\eta_{2,k}\right\}_{k\in\mathcal{K}}$ are the introduced non-negative variables associated with the condition in (22) and (24) in the general S-procedure, respectively.

With previous operations of introducing the auxiliary variables in (11) and (12), along with the reformulation against the uncertainties resulting in (16) and (25), we reach a deterministic problem eliminating the uncertainties as a lower bound for the original inner optimization in (10a), specified as

[TABLE]

For the problems in (10a) and (26a), the new constraints in (26c) and (26d) combat the uncertainties with the corresponding error bounds, and thus the solution to (26a) achieves robustness as compared with the original counterpart in (10a). Furthermore, we introduce a new variable $\varphi$ , to tackle the non-continuous operation in the objective function, and induce the optimization problem as

[TABLE]

which facilitates further discussions to solve for jamming and reflection strategies as detailed below.

IV-B Jamming Optimization

We first address the jamming subproblem while considering fixed reflection coefficients at the RISs, i.e., to solve for $\bm{Z}$ with fixed $\bm{\vartheta}_{A}$ and $\bm{\vartheta}_{R}$ in (27a). Correspondingly, we ignore the constraints related to the reflection coefficients and simplify the problem as

[TABLE]

As the jamming optimization variable is incorporated in $\gamma_{D}$ , we can rewrite $\gamma_{D}$ as

[TABLE]

where

[TABLE]

are defined for notation simplicity. By substituting the equality $\tilde{\bm{h}}_{JD}^{{\dagger}}\bm{Z}\tilde{\bm{h}}_{JD}=\mathsf{Tr}\left(\bm{Z}\tilde{\bm{H}}_{JD}\right)$ with $\tilde{\bm{H}}_{JD}=\tilde{\bm{h}}_{JD}^{{\dagger}}\tilde{\bm{h}}_{JD}$ into the objective function, we can see that the problem in (28a) is a semidefinite programming (SDP) problem with respect to jamming optimization. Also, the non-convexity in (28a) lies in the objective function and the constraint in (28e). In this regard, by applying [37, Lemma 1], we introduce an auxiliary variable $t_{JD}$ , to linearize the non-concave term to approximate the objective function as

[TABLE]

which amounts to the original objective function on condition that

[TABLE]

For the non-convex constraint in (28e), we can employ the same procedure above to approximate it as

[TABLE]

which equals the original when

[TABLE]

Through the operations above, we reformulate the problem in (28a) as

[TABLE]

The problem in (35a) can be easily verified as a convex optimization with respect to the optimization variables, and thus can be conveniently solved with off-the-shelf solvers. Then, the optimum through (35a) needs to be substituted into (32) and (34) to update the auxiliary variables. Finally, the problem solving in (35a) and updates in (32) and (34) are conducted in an iterative manner, where the convergence brings the optimal jamming beamforming for secure transmissions.

IV-C Reflection Optimization at the ARIS

Then, we consider the reflection optimization at the ARIS, while treating jamming and reflection at the fixed RIS as constants, i.e., to solve for $\bm{\vartheta}_{A}$ with fixed $\bm{Z}$ and $\bm{\vartheta}_{R}$ in (27a). In this regard, the problem is simplified as

[TABLE]

To facilitate the problem solving, we rewrite $\gamma_{D}$ as

[TABLE]

where

[TABLE]

are defined for notation simplicity. As we can see in (37) that $\gamma_{D}$ is a quadratic and thus convex function with respect to $\bm{\vartheta}_{A}$ , we can then exploit the first-order approximation as a lower-bound at $\bm{\vartheta}_{A}^{\circ}$ to linearize it as

[TABLE]

where

[TABLE]

Then, for the unit-modulus constraint regarding the elements of $\bm{\vartheta}_{A}$ , we can convert it into the joint constraints as

[TABLE]

where the first inequality with non-convexity can be linearly approximated at $\bm{\vartheta}_{A}^{\circ}$ as

[TABLE]

Also, the non-convex constraint in (36e) can be similarly treated as that in (33), with the auxiliary variable $t_{k}$ that equalizes the original constraint in (36e) when the condition in (34) is satisfied.

Based on the discussions above, we arrive at a convex counterpart of the ARIS reflection problem in (36a), given as

[TABLE]

which is an approximation at $\bm{\vartheta}_{A}^{\circ}$ , and $\left\{\iota_{A,n}\right\}_{n=1,2\cdots,2N_{A}}$ are additionally introduced to improve the convergence with $\lambda_{A}$ as the coefficient for penalty. Then, the problem solving of (43a) for ARIS reflection and the auxiliary variable update in (34) are conducted in an iterative manner to obtain the current optimum, denoted by $\bm{\vartheta}_{A}^{\star}$ , on condition of the approximation at $\bm{\vartheta}_{A}^{\circ}$ . Finally, a successive convex approximation (SCA) procedure is conducted that the approximation point is updated with the current optimum to reach the next-round optimal reflection, and the convergence of the SCA procedure provides the optimal ARIS reflection coefficients.

IV-D Reflection Optimization at the Fixed RIS

Considering constant jamming and reflection at the ARIS, the reflection optimization at the fixed RIS is given as

[TABLE]

Given the complicated relationship between the SINR at the legitimate receiver and considered reflection coefficient, we introduce a new variable $\psi_{JD}$ and reformulate $\gamma_{D}$ as

[TABLE]

where

[TABLE]

For the inequality regarding $\psi_{JD}$ in (46), we can adopt the Schur complement to recast it in the form of linear matrix inequality as

[TABLE]

Further, for the quadratic term with respect to reflection coefficient in (45), we can employ the first-order Taylor expansion similarly as (39) to reach that

[TABLE]

as an approximation at $\bm{\vartheta}_{R}^{\circ}$ with

[TABLE]

By replacing $\gamma_{D}$ with the lower bound given in (48), the objective function in (44a) is now concave with respect to the reflection coefficients. When jointly considering the newly introduced variable $\psi_{JD}$ , we can adopt the same technique as (31) to reformulate the objective function as

[TABLE]

with an introduced variable $t_{RD}$ . Also similar as before, the reformulation is equivalent on condition that

[TABLE]

Then, the unit-modulus constraint regarding the elements of $\bm{\vartheta}_{R}$ can be similarly treated as (41) and (42), leading to

[TABLE]

approximated at $\bm{\vartheta}_{R}^{\circ}$ . Meanwhile, the constraint in (25) is no longer a linear matrix inequality with respect to $\bm{\vartheta}_{R}$ . For this issue, the non-linear part traces back to the inequality in (18) incorporating the quadratic term against the reflection coefficients. Recalling that $\tilde{\bm{\vartheta}}^{{\dagger}}=\left[1\>\>\bm{\vartheta}_{R}^{{\dagger}}\right]$ , we can use the first-order approximation at $\bm{\vartheta}_{R}^{\circ}$ given as

[TABLE]

Correspondingly, by defining $\bm{\Xi}\overset{\Delta}{=}{\bm{\Psi}}^{T}_{R}\left(\bm{\vartheta}_{R};\bm{\vartheta}_{R}^{\circ}\right)\otimes\bm{Z}$ , we have $\bm{\Omega}\geq\bm{\Xi}$ , and the inequality in (18) is approximated as

[TABLE]

Then, with the general S-Procedure conducted similarly as that in Sec. IV-A, we reach an approximated version of the inequality in (25) at $\bm{\vartheta}_{R}^{\circ}$ , given as (55),

which is a linear matrix inequality with respect to $\bm{\vartheta}_{R}$ . Finally, the non-convex constraint in (44e) can be similarly tackled as (33) and (34).

With the reformulations above, we arrive at an convex problem given as

[TABLE]

which is approximated at $\bm{\vartheta}^{\circ}$ , and similarly as in (43a), the variables $\left\{\iota_{R,n}\right\}_{n=1,2\cdots,2N_{R}}$ are introduced to improve the convergence. As we solve the problem in (56a) to obtain the reflection coefficients, the auxiliary parameters are updated according to (34) and (51), and this process is continued until the convergence brings the current optimum, denoted by $\bm{\vartheta}_{R}^{\star}$ . Then, we employ the SCA technique to use the current optimum as the next-round approximation point, i.e., $\bm{\vartheta}_{R}^{\circ}\leftarrow\bm{\vartheta}_{R}^{\star}$ , to further update the reflection coefficients. The convergence of the SCA procedure brings the optimal reflection at the fixed RIS.

IV-E Algorithm Design

In the preceding discussions, we have tackled the uncertainties to formulate the robust secrecy optimization problem, where the jamming and reflection optimizations are analyzed separately. Then, we can employ the BCD framework to update the jamming beamforming, ARIS reflection, and the reflection at the fixed RIS in an iterative manner, and the convergence achieves a suboptimum towards the robust secrecy optimization. The algorithm is summarized in Alg. 1, where the outer loop is for BCD framework, where $\tau=0,1,2,\cdots$ indicates the iterations and the constant $\varepsilon$ claims the convergence. Meanwhile, obtaining the reflection at the RISs requires inner loops in the form of SCA procedures, where the constants $\varepsilon_{A}$ and $\varepsilon_{R}$ indicate the convergence.

V Learning for Deployment

In this section, we consider the deployment issue of the aerial platform as the outer subproblem of the original optimization in (9a). The deployment affects the wireless channels related to the aerial platform, and further impacts the aerial reflection and cooperative jamming. Given the double-layer structure to solve the problem, the deployment in the outer layer is evaluated with inner problem providing the intermediate results. In this regard, we adopt the deep reinforcement learning technique to determine the deployment as the outer problem. The reinforcement learning technique enables effective decision-making due to its goal-driven nature while adapting to the environment, which has been widely used in existing researches [38]. For our considered problem, as the inner problem can be efficiently solved through optimization as detailed before, the resultant secrecy performance then significantly facilitates the learning process towards the optimal deployment.

V-A MDP Formulation

The learning-based solution to determine the deployment can be formulated as a Markov decision process (MDP). A MDP is considered over a time series denoted by $\mathcal{T}=\left\{0,1,\cdots,t,\cdots,T\right\}$ , along with the state space, action space, and reward. For our considered problem of robust secrecy optimization, these components are elaborated as follows.

•

State space: For the considered problem, the robust secrecy rate inherently depends on the network topology and the bound of channel imperfection. As such, given fixed locations of source, destination, and fixed RIS, the state is defined as the set consisting of the location of the aerial platform, channel condition in the network, and the associated uncertainties, given as

[TABLE]

Note rigorously, the elements of the state are associated with time instant $t$ as the argument, which is omitted for notation simplicity. Then, all possible states constitute the state space as $\bm{s}\left(t\right)\in\mathcal{S}$ , $\forall t\>\in\mathcal{T}$ .

•

Action space: An action is defined as the deployment update of aerial platform while learning, i.e., $\bm{a}\left(t\right)=\left[w_{A}^{\text{(x)}}\left(t\right),w_{A}^{\text{(y)}}\left(t\right)\right]-\left[w_{A}^{\text{(x)}}\left(t-1\right),w_{A}^{\text{(y)}}\left(t-1\right)\right]$ at time instance $t\in\mathcal{T}$ .

•

Reward function: The reward function is defined as the change of robust secrecy rate as compared with that in previous time, on condition of current state and action, denoted by $r\left(t\right)=R_{S}\left(t\right)-R_{S}\left(t-1\right)$ . This reward function is defined in consistence with the definition of action and the transmission strategy is obtained through the Alg. 1 to assist the evaluation of secrecy.

Given the MDP model above, the aerial platform as the agent in MDP learns to find the desired deployment. As for learning, the agent determines the action in current space according to the policy given as $\bm{\mu}:\mathcal{S}\mapsto\mathcal{A}$ . Then, the system state evolves to a new state as $\mathcal{S}\times\mathcal{A}\mapsto\mathcal{S}$ . Meanwhile, the agent obtains an instantaneous reward as $R_{S}:\mathcal{S}\times\mathcal{A}\mapsto\mathbb{R}_{+}$ . Through the learning process, the agent intends to maximize the long-term expected reward defined as $\Gamma=\sum\nolimits_{t=0}^{T}\nu^{t}r\left(t\right)$ , where $\nu\in(0,1)$ is the discount factor.

V-B DDPG-Based Algorithm

As the aerial deployment issue is investigated within a continuous area, we adopt the deep deterministic policy gradient (DDPG) approach that tackles problems in continuous action space [39]. The DDPG framework has an actor-critic network structure, where the actor network observes the current state and produces an action based on the strategy and the critic network provides an evaluation regarding the action. Besides the operations in the evaluation networks noted before, the DDPG framework also incorporates the target networks integrating the experience replay. The replay buffer helps reduce the correlation of data samples and the delayed strategy updates in the target network improve the stability of the algorithm implementation.

The DDPG-based deployment algorithm is specified in Alg. 2, where the main operations are elaborated as follows. We first construct the evaluation actor and critic networks with parameters $\bm{\omega}^{Q}$ and $\bm{\omega}^{\bm{\mu}}$ , respectively, which is then copied as the initial target networks. Also, the environment is specified based on the communication system state. Then, at the training stage, at each epoch- $t$ with state $\bm{s}\left(t\right)$ , the agent selects and takes an action $\bm{a}\left(t\right)$ , based on current policy $\bm{\mu},$ along with an random noise. Meanwhile, the taken action produces an instantaneous reward as $r\left(t\right)$ and an updated state as $\bm{s}^{\prime}\left(t\right)$ . The transition tuple $\left(\bm{s}\left(t\right),\bm{a}\left(t\right),r\left(t\right),\bm{s}^{\prime}\left(t\right)\right)$ is stored in the replay buffer. After sufficient rounds of training to fill the replay buffer, we then randomly pick $D$ groups of transition as a mini-batch for learning. Specifically, the evaluation critic network is trained by minimizing the loss function defined as

[TABLE]

where

[TABLE]

with $Q$ and $Q^{\prime}$ being the action-value function. Meanwhile, the evaluation actor network parameters are updated according to gradients as

[TABLE]

Then, the target network parameters are updated according to the soft update rule to improve the stability of the learning process.

When the neural networks are well-trained, the agent selects the action based on the network output at each stage, given certain initial communication network status. Then, the final convergence provides a learnt deployment for the aerial platform to assist the robust secure transmissions.

V-C Implementation Issue

As the proposed DDPG-learning algorithm (Alg. 2) incorporates the BCD-based transmission algorithm (Alg. 1) providing intermediate results, we discuss the overall algorithm implementation here. First, regarding the initialization, the aerial platform can be placed at any random spot within the region, and the artificial noise covariance and reflection coefficients can also be randomly initialized as long as the constraints in (10a) on jamming power and modulus are satisfied. Then, the channel information is required to conduct the algorithm. In this regard, we can use the properly designed training sequences to efficiently estimate the cascaded channels between the legitimate pair [40]. For passive eavesdroppers, we can simply use their location information to estimate the channel due to the line-of-sight-dominated air-ground transmissions in our considered scenarios. Also, the uncertainty-associated error bounds can be obtained based on historical data. Afterwards, the proposed algorithms are conducted based on the collected information. Technically, the computation can be carried out at the aerial platform which usually has the processors and energy source. Then, the aerial deployment, ARIS reflection, and jamming can be readily applied at the aerial platform, while the results are also feed back to the fixed RIS to update the reflection coefficients.

VI Simulation Results

In this section, we present simulation results to show the performance of the proposed robust secure transmission scheme. Specifically, we consider a ground area of 400 $\times$ 400 (distance in meters and the same afterward), where the transmitter is located at the origin and the legitimate receiver is located at (350, 0). There are three eavesdroppers randomly located in a circle area centered at (300, 300) with a radius of 50, noted as the eavesdropping area. Also, the fixed RIS is located at (100, 150) with a height of 50, while the height of the ARIS is assumed of 150. As the signal propagation in the system is concerned either with the fixed RIS or the aerial platform, we adopt the Rician channel model. In particular, the channels associated with the fixed RIS are assumed of a Rician factor of 3 dB, while the channels related to the aerial platform are of a Rician factor of 10 dB, due to the fact that the aerial platform locates higher than the fixed RIS. Meanwhile, the path loss exponents for the air-ground channels concerning the fixed RIS and ARIS are 2.6 and 2.2, respectively. The air-to-air links between the aerial jammer and the fixed RIS have a path loss exponent of 2. The path loss at the reference distance is 20 dB for all channels. The transmit power at the source node is 30 dBm, and the aerial jammer has 4 antennas with a maximum jamming power of 25 dBm. The background noise power is -110 dBm. The fixed RIS and ARIS both have 50 reflecting elements. Regarding the channel uncertainties, we define the uncertainty coefficient denoted by $\delta$ , given as $\epsilon_{X}=\delta\left\|\hat{\bm{h}}_{X}\right\|$ , where $X\in\left\{SAk,SRk,Jk,JRk\right\}$ with $k\in\mathcal{K}$ . The uncertainty coefficient is assumed of 0.01. Moreover, the parameters for reinforcement learning are detailed as follows. The number of episodes is 2,000. The replay buffer size and batch size are 20,000 and 256, respectively. The learning rates for the actor network and critic network are 0.0001, and the soft update coefficient is 0.005. The discount factor is 0.95.

Fig. 2 demonstrates the ARIS deployment along with the achieved robust secrecy rate. In this figure, the bar location indicates ARIS deployment and the bar height corresponds to the achieved robust secrecy rate. Three cases are illustrated with the fixed RIS deployed at (100, 150), (150, 250), and (250, 50) under Case 1,2,3, which lead to the aerial deployment at (161, 89), (218, 63), and (136, 169), respectively. As we can see, the deployment results imply that usually one RIS locates closer to the eavesdroppers while the other closer to the legitimate receiver. It can be explained that one RIS nearer to the legitimate receiver helps enhance the reception, while the other nearer to the eavesdroppers strengthens the active jamming, by either direct jamming (through aerial jamming) or reflective jamming (through the fixed RIS). Also, the induced robust secrecy rates under different cases are rather close, indicating that the flexible deployment of the ARIS can effectively compensate for the performance under different fixed-RIS deployments. Moreover, under Case-1 deployment, we show the results under perfect CSI, without jamming or fixed RIS. As we can see, when without jamming, the ARIS deployment locates farther to the fixed RIS as compared with the case with jamming, since the aerial platform no longer needs the fixed RIS to enhance the active jamming. When there is no fixed RIS, the aerial platform locates nearer to the eavesdroppers as active jamming can be more direct and effective in defending against eavesdropping. Further, for Case-1, we find the optimal deployment through global search instead of reinforcement learning, inducing the aerial deployment at (157, 85) with an achieved robust secrecy rate at 6.93 bps/Hz. In contrast, the aerial deployment through proposed learning is at (161, 89) with a robust secrecy rate of 6.83 bps/Hz. Note that the results under global search are not explicitly shown in the figure as they are rather close to the existing demonstration. The results indicate that our proposal can effectively solve the problem while approaching the optimum.

Fig. 3 shows the performance with different locations of the eavesdropping areas. In Fig. 3(a) showing the robustness performance, we can see that the worst-case secrecy rate is higher with a smaller number or farther location of the eavesdroppers. Also as expected, the robust scheme has better performance as compared with the non-robust one under the worst case. Moreover, when the eavesdroppers locate farther, the performance superiority of the robust scheme as compared with the non-robust one becomes more evident. In Fig. 3(b), we show the performance comparison under different schemes. Besides the similar trend as that in Fig. 3(a), we can see our proposal outperforms the baselines. Moreover, from the cases without fixed RIS or ARIS, we can see that the ARIS is more effective in defending against eavesdropping attacks, due to its flexible deployment. Further, when there is neither ARIS nor jamming, i.e., removing the aerial platform, the performance with one single RIS under fixed deployment can be significantly undermined.

Fig. 4 shows the performance concerning the transmit power. In Fig. 4(a) showing the robustness, the secrecy rate increases with higher transmit power and smaller uncertainties. Particularly, we can see that when the uncertainty is larger, the transmission behavior becomes more conservative to tackle the worst case, and thus the speed of secrecy rate increase is slower as compared with that with smaller uncertainties. Accordingly, the performance gap between the robust scheme and the non-robust one is larger when the uncertainty is smaller. Moreover, compared with the case with perfect information, the performance can be evidently degraded when considering the channel uncertainties. Fig. 4(b) compares the performance under different proposals, showing similar trends as those in Fig. 3(b). We can also observe that when the transmit power is higher, the advantage of our proposal becomes more significant as compared with the baselines.

Fig. 5 shows the performance considering the number of reflecting elements at the RISs. Fig. 5(a) for robustness and Fig. 4(b) are of similar trends as those in Figs. 4(a) and 4(b), respectively. While for Figs. 4 and 5, we can see that the worst-case secrecy rate increases almost linearly with the exponential increase of transmit power, while with the linear increase of the number of reflecting elements at the RISs. In this regard, we can see that the application of RISs in wireless networks can effectively compensate for the security performance if the transmit power is bottlenecked in wireless networks. Moreover, in Fig. 4, the differences in achieved robust secrecy rates under different proposals are enlarged when the transmit power increases, while in Fig. 5, the differences among different proposals almost remain constant. This indicates that the reflection can magnify the effect of security enhancement through increased power at the source.

In Fig. 6, we consider that the total number of reflecting elements at the RISs is fixed at 100, while evaluating the performance with different settings of elements at the fixed RIS and ARIS. Similarly, we can see that our proposal outperforms the non-robust schemes or the case without jamming, yet is inferior to the case without uncertainties. More importantly, we can see that there exists a tradeoff in distributing the reflection capability in two RISs, and the collaboration between two RISs generally improves the security as compared with the cases to use one single. Also, from the leftmost and rightmost cases in the figure, corresponding to solely using the fixed RIS and ARIS, we can see that the cases with ARIS have superior performance to the other, indicating that the flexible deployment brings ARIS evident advantages. Moreover, we can see that the configuration achieving the highest robust secrecy rate is that the ARIS is equipped with 80% of the total reflecting elements, indicating that the RIS with optimized deployment deserves higher reflecting capability to achieve the best performance.

In Fig. 7, we show the performance considering different uncertainty levels. Generally, the security performance is downgraded with higher uncertainties for all schemes, while our proposal outperforms the baselines. Also, the robust secrecy rate decreases faster under our proposal compared with the cases without one single RIS or jamming. This is because, when without RIS or jamming, the corresponding channels for reflection or jamming related to the eavesdroppers no longer exist, and so are the associated uncertainties. In this regard, the overall channel information uncertainty in the system is reduced and thus the security performance loss due to uncertainties is alleviated.

Overall, the numerical results have shown the effectiveness of our proposed scheme for security provisioning while tackling uncertainties. Particularly, we can see that the ARIS with flexible deployment generally outperforms the conventional fixed RIS deployment, where the security performance can be further enhanced with cooperative aerial jamming. Moreover, the joint use of fixed RIS and ARIS can dynamically adapt to different network topologies, as the flexible deployment of ARIS can effectively compensate for the potentially unfavorable location of the fixed RIS.

VII Conclusion

In this paper, we propose to exploit aerial reflection and jamming to enhance wireless security, where robust security is proposed to address the channel uncertainties. Specifically, we employ robust optimization approaches to tackle the reflection and jamming, and aerial deployment is obtained through deep reinforcement learning. Results show that the proposed scheme can effectively combat the uncertainties to achieve robust security under the worst case. Also, the ARIS with flexible deployment is more effective compared with fixed RIS in terms of security provisioning, and the collaborative operation between the fixed RIS and ARIS can significantly improve the security performance.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart radio environments empowered by reconfigurable intelligent surfaces: How it works, state of research, and the road ahead,” IEEE J. Sel. Areas Commun. , vol. 38, no. 11, pp. 2450–2525, Nov. 2020.
2[2] Q. Wu, S. Zhang, B. Zheng, C. You, and R. Zhang, “Intelligent reflecting surface-aided wireless communications: A tutorial,” IEEE Trans. Commun. , vol. 69, no. 5, pp. 3313–3351, May 2021.
3[3] R. Khan, P. Kumar, D. N. K. Jayakody, and M. Liyanage, “A survey on security and privacy of 5G technologies: Potential solutions, recent advancements, and future directions,” IEEE Commun. Surveys Tuts. , vol. 22, no. 1, pp. 196–248, 1st Quart. 2020.
4[4] J. Zhang, H. Du, Q. Sun, B. Ai, and D. W. K. Ng, “Physical layer security enhancement with reconfigurable intelligent surface-aided networks,” IEEE Trans. Inf. Forensics Sec. , vol. 16, pp. 3480–3495, 2021.
5[5] J. Luo, F. Wang, S. Wang, H. Wang, and D. Wang, “Reconfigurable intelligent surface: Reflection design against passive eavesdropping,” IEEE Trans. Wireless Commun. , vol. 20, no. 5, pp. 3350–3364, May 2021.
6[6] S. Hong, C. Pan, H. Ren, K. Wang, and A. Nallanathan, “Artificial-noise-aided secure MIMO wireless communications via intelligent reflecting surface,” IEEE Trans. Commun. , vol. 68, no. 12, pp. 7851–7866, Dec. 2020.
7[7] M. A. Kishk and M.-S. Alouini, “Exploiting randomly located blockages for large-scale deployment of intelligent surfaces,” IEEE J. Sel. Areas Commun. , vol. 39, no. 4, pp. 1043–1056, Apr. 2021.
8[8] L. Dong, H.-M. Wang, J. Bai, and H. Xiao, “Double intelligent reflecting surface for secure transmission with inter-surface signal reflection,” IEEE Trans. Veh. Technol. , vol. 70, no. 3, pp. 2912–2916, Mar. 2021.