# Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System

**Authors:** Beibei Cui, Shanyong Cai, Liqian Wang, Zhiguo Zhang, Feng Wang

PMC · DOI: 10.3390/s26030948 · Sensors (Basel, Switzerland) · 2026-02-02

## TL;DR

This paper proposes a reinforcement learning framework to optimize HAPS trajectories in a hybrid FSO/RF system, improving connectivity by avoiding cloud interference.

## Contribution

The novel integration of soft-switching with rateless codes and DRL for cloud-aware HAPS trajectory optimization is introduced.

## Key findings

- RC-PPO achieves higher throughput compared to the HS-PPO baseline.
- Trajectories optimized with RC-PPO are smoother and more cloud-aware.
- The proposed framework effectively handles sparse feedback from rateless codes.

## Abstract

Space–air–ground systems employing free-space optical (FSO) communication leverage high-altitude platform stations (HAPS) to deliver seamless and ubiquitous connectivity. Although FSO links offer high capacity, they are highly susceptible to cloud extinction, which severely degrades link availability. Hybrid FSO/radio-frequency (RF) transmission and cloud-aware HAPS trajectory optimization can enhance resilience. However, the conventional cloud-aware hybrid FSO/RF transmission system based on hard-switching (HS) between the FSO and RF links leads to frequent link transitions and unstable throughput. To address these challenges, we propose a joint optimization framework that integrates soft-switch between FSO and RF links with deep reinforcement learning (DRL) for HAPS trajectory optimization. Soft-switching based on rateless codes (RCs) enables simultaneous transmission over both links, where the receiver accumulates packets until successful decoding with a single feedback. The feedback frequency of RC is sparse, which avoids feedback storms but also poses challenges to HAPS trajectory optimization. The DRL agent proactively optimizes HAPS trajectories to avoid cloud cover and maintain link availability. To address the sparse feedback of RCs for DRL training, a reward-shaped proximal policy optimization (PPO)-based agent is developed to jointly optimize throughput and trajectory smoothness. Simulations using realistic ERA5 data show that RC-PPO achieves higher throughput and smoother trajectories compared to the HS-PPO baseline.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899961/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899961/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899961/full.md

---
Source: https://tomesphere.com/paper/PMC12899961