Federated Learning Semantic Communication in UAV Systems: PPO-Based Joint Trajectory and Resource Allocation Optimization

Shuang Du; Yue Zhang; Zhen Tao; Han Li; Haibo Mei

PMC · DOI:10.3390/s26020675·January 20, 2026

Federated Learning Semantic Communication in UAV Systems: PPO-Based Joint Trajectory and Resource Allocation Optimization

Shuang Du, Yue Zhang, Zhen Tao, Han Li, Haibo Mei

PDF

Open Access

TL;DR

This paper proposes a new method for UAV communication using semantic information and federated learning to reduce computational load and improve efficiency.

Contribution

The novel contribution is a PPO-based framework for joint trajectory and resource allocation in UAV-assisted semantic communication using federated learning.

Findings

01

Federated learning reduces computational burden on UAVs by offloading tasks to edge devices.

02

The PPO-based algorithm minimizes energy consumption and task completion time while ensuring service fairness.

03

Experimental results show improved quality-of-service and reduced resource consumption in UAV systems.

Abstract

Semantic Communication (SC), driven by a deep learning (DL)-based “understand-before-transmit” paradigm, transmits lightweight semantic information (SI) instead of raw data. This approach significantly reduces data volume and communication overhead while maintaining performance, making it particularly suitable for UAV communications where the platform is constrained by size, weight, and power (SWAP) limitations. To alleviate the computational burden of semantic extraction (SE) on the UAV, this paper introduces federated learning (FL) as a distributed training framework. By establishing a collaborative architecture with edge users, computationally intensive tasks are offloaded to the edge devices, while the UAV serves as a central coordinator. We first demonstrate the feasibility of integrating FL into SC systems and then propose a novel solution based on Proximal Policy Optimization…

Figures4

Click any figure to enlarge with its caption.

Funding1

—Mianyang Science Project

Keywords

semantic communicationfederated learningdeep learningresource allocationtrajectory optimizationUnmanned Aerial Vehicle (UAV)Proximal Policy Optimization (PPO)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data

Full text

1. Introduction

Nowadays, semantic communication (SC) is gradually attracting intensive research interests, and could potentially become a key technology of future 6G mobile networks [1]. SC was first proposed in the 1950s by R. Carnap et al. [2], and it is currently becoming a hot topic again, due to the emergence of sophisticated artificial intelligent (AI) technologies. Using Deep Learning (DL), the transmitter can easily achieve semantic extraction (SE) from raw data, like text [3], image [4], and speech [5,6]; and the receiver can carry out data inference/regeneration with regard to the extracted and transferred semantic information (SI) [7,8], considering the semantic effectiveness of the transmitted symbols. As a result, SC is the paradigm shift of Shannon’s Classical Information Theory (CIT), which adopts a “transmit-before-understanding” approach. Instead, SC leverages an “understand-before-transmit” strategy, thereby alleviating bandwidth pressure by reducing the amount of data to be exchanged and saving the communication cost. Further, thanks to the emergence of large language models (LLMs) [9], SC can be widely deployed with the help of artificial intelligence-generated content (AIGC) [10,11].

Due to its advantages, SC is particularly suitable for UAV communications. In traditional UAV communication, it follows CIT and jointly optimizes the UAV trajectory, resource allocation, air–ground communication scheduling, etc., to lead to a maximum UAV energy efficiency, while guaranteeing the QoS requirement of the air–ground communications [12,13,14] or the computation services offered by the UAV as a mobile edge computing server [15,16]. However, “Shannon’s trap” will be always the bottleneck of those traditional UAV communication systems. Furthermore, each UAV is constrained by its size, weight, and power (SWAP), which will severely limit the communication and computation functions in the air [17]. SC is very suitable to solve these issues in UAV systems. There have been a few works on UAV semantic communication, like [18,19]. Nevertheless, refs. [18,19] did not consider that each UAV may unable to handle the intensive computing on SE using DL methods, due to the SWAP constraints.

In [20,21], the authors all pointed out that distributed computing using edge intelligence is a promising technology to realize SC in systems with no consistent infrastructures, like UAV communications or IoT systems. Specifically, the authors in [20] proposed using federated learning (FL) [22] to solve the problem caused by limited computing power, energy constraint, and storage of the end devices during SC. On the contrary, without such distributed computing, those device constraints may result in long latency in training, updating, and knowledge sharing of the SE model, thereby degrading communication reliability. Ref. [23] introduced the design of the DL-based auction for computing resource allocation in SC-enabled Metaverse applications, where SC and edge computing are used as two disruptive solutions. However, there is no work so far considering the distributed edge intelligence for UAV semantic communications. In this paper, we apply federated learning to help UAV semantic communications.

Federated learning has been well-studied in UAV communications to improve the performance of UAVs in the computing and communication services they provide. Ref. [24] considered dynamic digital twin and federated learning for air–ground networks, where a UAV works as the aggregator and the collaborate digital twin model works as a trainer in the air. Ref. [25] developed an asynchronous federated learning (AFL) framework for multi-UAV-enabled networks, and designed a joint device selection, UAVs placement, and resource management algorithm to enhance the federated convergence speed and accuracy. Similarly, the works in [26,27,28] all considered using FL to enable UAV performances with joint optimization algorithms on various network settings. According to the best of our knowledge, no work considers has as yet considered FL used in UAV semantic communication. Overall, the contributions of this paper are as follows:

Synergy of Federated Learning and Semantic Communication: We propose a novel FL-based framework where edge users collaboratively train semantic communication models, with the UAV serving as the central server. This approach effectively distributes the computational load of semantic extraction, alleviating the processing burden on resource-constrained UAVs while maintaining data privacy.
Distributed Semantic Extraction Paradigm: To overcome UAVs’ computational limitations in deep learning-based semantic processing, we design a distributed training system that leverages edge intelligence to eliminate the computational bottlenecks inherent in deep learning-based semantic processing. This ensures that the UAV can maintain high-performance semantic communication without being overwhelmed by local processing requirements.
Joint Trajectory and Resource Optimization via PPO: Beyond just the learning architecture, we develop a PPO-based online optimization method. It uniquely integrates the dynamic flight trajectory with bandwidth allocation to satisfy the specific Quality-of-Service (QoS) and fairness requirements of semantic tasks, which are more complex than those in traditional bit-level transmission systems.
Comprehensive Performance Evaluation: Experimental results demonstrate that our FL-enabled semantic communication system achieves significant improvements in training efficiency and resource utilization, well addressing the computational constraints of UAVs. Meanwhile, the PPO-based trajectory optimization effectively minimizes energy consumption and task completion time while ensuring equitable QoS for edge users, fully meeting the design requirements of dynamic UAV-assisted semantic communication systems.

The remainder of this paper is organized as follows. Section 2 details the system model and outlines the associated problem formulation. In Section 3, we present the solution to the optimization problem. In Section 4, simulation results and analysis are presented. In Section 5, we provide conclusions and future work. Other Notations: In this paper, $[eqn]$ denotes the set of $[eqn]$ complex vectors. $[eqn]$ denotes the array with N elements. diag $[eqn]$ denotes the diagonalization operation. $[eqn]$ denotes the transpose operation. $[eqn]$ denotes the expectation operation. $[eqn]$ denotes the determinant operation.

2. System Model and Problem Formulation

The architecture of the FL-enabled UAV semantic communication system is illustrated in Figure 1a. The UAV is denoted as u, and the users are denoted by $[eqn]$ . Each user is located at a fixed position $[eqn]$ , where $[eqn]$ , $[eqn]$ , and $[eqn]$ denote the 3D coordinates of the k-th user. To simplify the problem, we let $[eqn]$ = 0.

During the federated learning training phase, the UAV acts as a central server to aggregate model updates from all users for semantic communication. After the training is completed, the UAV will provide semantic communication services to the K users in the area.

We consider a discrete-time system where the time horizon is partitioned into M equal-length time slots, indexed by $[eqn]$ . The duration of each time slot is $[eqn]$ seconds, representing the minimum time unit for UAV movement and communication operations. The position of the UAV at time slot m is represented by a two-dimensional coordinate vector $[eqn]$ , and to simplify the problem, we assume that the UAV maintains a constant altitude H throughout the process, i.e., $[eqn]$ for all m.

2.1. Movement Model for UAV

The operational airspace is modeled as a rectangular region with dimensions $[eqn]$ , where a and b represent the width and length of the region, respectively. The position constraints of the UAV at the m-th time slot are given as follows:

[eqn]

[eqn]

Equations (1a) and (1b) ensure that both the UAV trajectory and user positions remain within the defined boundaries. The boundary values a and b define the spatial constraints of the mission area. In practical scenarios, these values are determined by the UAV’s flight endurance, the effective communication range of the semantic models, and local airspace regulations. For the purposes of this study, we set these parameters to reflect a typical urban or suburban deployment area, ensuring the UAV operates within a safe and manageable flight radius.

The kinematic model of the UAV is defined by the following equations. The velocity update equation is

[eqn]

where $[eqn]$ denotes the horizontal velocity, $[eqn]$ denotes the acceleration during time slot $[eqn]$ .

The position update equations are

[eqn]

[eqn]

where $[eqn]$ denotes the heading angle at time slot m, which controls the direction of movement in the 2D plane.

All parameters are subject to the following physical constraints:

[eqn]

[eqn]

where $[eqn]$ denotes the maximum horizontal velocity of the UAV, and $[eqn]$ is the maximum acceleration of the UAV.

2.2. Semantic Communication Model

To realize the SC between the UAV and users, a JSCC scheme is employed to integrate the SE and channel encoding/decoding into series of DL steps. With JSCC, each user can extract the semantic information of an image using the Convolutional Neural Network (CNN), and then the extracted SI will be jointly coded and transferred via noisy wireless channels. In this way, a user can transfer its local images to the UAV without intensive data volume.

The block diagram of JSCC is shown in Figure 1b. In the phase of training semantic communication based on federated learning, what we mainly consider is the semantic communication between users and UAVs. Let us assume the user takes the n-dimensional image as the m-th time slot input: $[eqn]$ . To transfer $[eqn]$ , the encoder of the user maps $[eqn]$ to a k-length vector of complex-valued channel input samples $[eqn]$ . The encoding is carried out by means of a CNN representing deterministic encoding function $[eqn]$ : $[eqn]$ , with parameters $[eqn]$ . Following the encoding operation, the joint source-channel coded sequence $[eqn]$ is sent over the wireless channel by directly transmitting real and imaginary parts of the channel input samples over the $[eqn]$ and $[eqn]$ components of the digital signal. The channel introduces corruption to the transmitted symbols, denoted by $[eqn]$ , and we model the transfer function as $[eqn]$ . Afterwards, the decoder in the UAV maps the corrupted complex-valued signal $[eqn]$ to an estimation of the original input $[eqn]$ , using a decoding function $[eqn]$ that is parameterized by the CNN with parameter set $[eqn]$ . Obviously, the decoder inverts the operations performed by the encoder to map the image features to an estimate $[eqn]$ of the originally transmitted image. The encoding and decoding functions are designed jointly to minimize the average distortion between the original input image $[eqn]$ and its reconstruction $[eqn]$ produced by the decoder:

[eqn]

where $[eqn]$ is the loss function of user; $[eqn]$ is a given distortion measure, and D is the number of samples used in the model training of the user. Further details about JSCC can be found in [4].

In this paper we model the communication channel as a series of non-trainable layers in JSCC, and the correlated transfer function $[eqn]$ can be denoted by

[eqn]

where the vector $[eqn]$ consists of independent identically distributed (i.i.d.) samples from a circularly symmetric complex Gaussian distribution function: $[eqn]$ , with average noise power $[eqn]$ ; and $[eqn]$ is the channel gain under through the air–ground channel fading model between the UAV and the user, which can be represented as $[eqn]$ with normal random variable $[eqn]$ .

In this study, we primarily leverage the mobility of the UAV to establish and maintain line-of-sight (LoS) links for point-to-point communication. While we acknowledge that shadowing effects may occur in complex environments, our proposed joint trajectory and resource allocation optimization is specifically designed to bypass obstacles, thereby maximizing the probability of LoS connectivity. Furthermore, given the relatively low operational speeds of the UAV in the considered scenarios, the impact of mobility-induced Doppler shifts on communication performance is marginal and thus omitted to maintain the tractability of the optimization problem. We consider $[eqn]$ is the close path-loss form between the UAV and k-th user at m-th time slot, it can be denoted as

[eqn]

where $[eqn]$ ; $[eqn]$ is the carrier frequency (Hz) and c is the speed of light (m/s); $[eqn]$ (in dB) is the loss corresponding to the LoS connection depending on the environment. Based on (7), the instantaneous signal to noise ratio (SNR) of the UAV’s link to the k-th user in time slot m, can be denoted by

[eqn]

where P is the uplink transmit power of the UAV; and $[eqn]$ denotes the total available bandwidth between the UAV and k-th user in time slot m in Hertz (Hz).

For the FL training stage, we introduce the average path loss $[eqn]$ , which quantifies the mean path loss from each UAV to each user. Mathematically, it is defined as

[eqn]

The performance of the UAV SC is quantified in terms of PSNR, which measures the ratio between the maximum possible power of the signal and the power of the noise that corrupts the signal. The PSNR on the UAV sending the image to the user via SC is defined as

[eqn]

where $[eqn]$ is the mean squared-error between the reference image x and the reconstructed image $[eqn]$ ; MAX is the maximum possible image pixels value of the image that is normally defined as the fixed value.

2.3. Federated Learning for UAV Semantic Communications

The effect of UAV SC is highly related to the encoder with CNN parameters $[eqn]$ and the decoder with CNN parameters $[eqn]$ . Therefore, it is vital to let each user train parameters $[eqn]$ and $[eqn]$ to reach high accuracy, i.e., minimizing $[eqn]$ , respectively. In this paper, we thus establish a FL-based distributed architecture within users and UAVs, as shown in Figure 1a. Such FL architecture leads to effective SC between UAV and users, without costing too much in terms of the energy and computation resources of UAVs.

In the FL process, let $[eqn]$ denote the global parameters of the JSCC model. The overall target of FL is to minimize the global loss function $[eqn]$ with $[eqn]$ , and broadcast the global parameters to each user for SC. In this way, each user can quickly obtain the global JSCC model parameters with high accuracy, instead of training those parameters on their own. This also saves the UAV from intensive energy consumption. To implement this FL process, we adopt the distributed approximate Newton (DANE) approach previously applied in FL over cellular wireless networks [29]. We then design an FL algorithm for UAV semantic communications, as shown in Algorithm 1.

Algorithm 1 goes to a loop with finite iterations to update the global model $[eqn]$ , and $[eqn]$ is the global FL model at a given iteration n. In one iteration, k-th user needs to solve the local optimization problem in parallel as

[eqn]

where $[eqn]$ is a constant value; and $[eqn]$ is the difference between the global FL model and local FL model for k-th user, i.e., $[eqn]$ is the local FL model of user k at iteration n. Obviously, $[eqn]$ is the object function indicating how the local loss function is affected by the differences between the local and global FL models, with regard to the gradients of local and global loss functions. Steps 7 to 9 of Algorithm 1 solve the local optimization problem using the gradient method as

[eqn]

where $[eqn]$ is the step size; $[eqn]$ is the value of $[eqn]$ at the i-th local iteration with given global model $[eqn]$ ; and $[eqn]$ is the gradient of function $[eqn]$ at point $[eqn]$ . With (12), one has $[eqn]$ . Then, based on the predefined accuracy $[eqn]$ , the local optimization in (11) can find solution $[eqn]$ ensuring

[eqn]

Algorithm 1 FL Algorithm for UAV Semantic CommunicationsRequire: Global model $[eqn]$ , local accuracy $[eqn]$ , global accuracy $[eqn]$ , user set $[eqn]$ Ensure: Optimized global model $[eqn]$

1:Initialize global model $[eqn]$ , set global iteration $[eqn]$
2:repeat
3: Each user $[eqn]$ computes $[eqn]$ and sends to UAV
4: UAV computes $[eqn]$
5: UAV broadcasts $[eqn]$ to all users
6: for each user $[eqn]$ in parallel do
7: Initialize local iteration $[eqn]$ , set $[eqn]$
8: repeat
9: Update: $[eqn]$
10: $[eqn]$
11: until local accuracy $[eqn]$ is obtained
12: Denote $[eqn]$
13: User k sends $[eqn]$ to UAV
14: end for
15: UAV computes: $[eqn]$
16: $[eqn]$
17:until global accuracy $[eqn]$ is obtained
18:return $[eqn]$

After each user finding $[eqn]$ with regard to accuracy $[eqn]$ , the user will send $[eqn]$ to the UAV at step 10 of Algorithm 1. The UAV can update the global model in the current iteration at step 12. Afterwards, Algorithm 1 will go to another iteration with the updated global model $[eqn]$ . Finally, the whole algorithm will stop and converge to a global accuracy $[eqn]$ . Then, the optimized global model $[eqn]$ will be obtained, ensuring $[eqn]$ . In other words, considering accuracy $[eqn]$ , the solution $[eqn]$ is a point such that

[eqn]

With the local and global accuracies $[eqn]$ and $[eqn]$ , we can find $[eqn]$ as the lower-bound of the number of local iterations of a UAV between step 7 and step 9. Based on L-Lipschitz and $[eqn]$ -strongly convex of the loss function $[eqn]$ , $[eqn]$ can be defined as

[eqn]

where $[eqn]$ ; and $[eqn]$ ; L and $[eqn]$ are determined by the loss function $[eqn]$ , further explained in [29].

On the other hand, the lower-bound of the number of global iterations of Algorithm 1 $[eqn]$ , can be defined as

[eqn]

where $[eqn]$ , and $[eqn]$ . More details on defining the global and local lower bound of the FL algorithm can be found in [29].

We can observe from (15) and (16) that a high local accuracy $[eqn]$ can save the global iterations, and a high global accuracy $[eqn]$ will lead to increased global iterations, and vice versa.

2.4. UAV Constraints on Semantic Communications

Due to the SWAP constraints, a UAV has limited energy and computation capacity. We first model the energy consumption of the UAV on propulsion, which consumes the major part of the UAV’s energy. According to [12], the propulsion energy cost in time slot m is

[eqn]

where $[eqn]$ is the horizontal velocity of the UAV along the time slot m and $[eqn]$ is the duration of each time slot. In addition, $[eqn]$ and $[eqn]$ are two defined constants, representing the blade profile power and induced power in hovering status, respectively. $[eqn]$ denotes the tip speed of the rotor blade; $[eqn]$ is known as the mean rotor induced velocity in hover; $[eqn]$ and s are the fuselage drag ratio and rotor solidity, respectively; and $[eqn]$ and G denote the air density and rotor disk area, respectively. In this paper, assume the propulsion energy of the UAV is closely related to the horizontal velocity of the UAV in each path segment. For the purpose of exposition and more tractable analysis, we ignore the additional/fewer energy consumption caused by UAV acceleration/deceleration or rising/falling, which is reasonable for scenarios when the UAV maneuvering duration only takes a small portion of the total operation time.

Another part of the energy consumed by the UAV is for the model inferencing of JSCC. To support model inferencing, it is assumed that the on-board CPU frequency of the UAV is F, which is fixed. Then, the model inferencing energy consumption of UAV u is

[eqn]

where $[eqn]$ is the effective switched capacitance that depends on the chip architecture; C is the number of CPU cycles per sample in the UAV. To this end, the UAV u has the following overall energy constraint

[eqn]

where $[eqn]$ , and $[eqn]$ is the maximum allowed propulsion energy during the whole mission.

2.5. Problem Formulation

Given that this paper selects image transmission as a typical task for semantic communication, we assume that the UAV transmits an image with an original data volume of $[eqn]$ to a user. During the encoding process, the Joint Source-Channel Coding (JSCC) scheme performs compression on the image with a compression ratio of C. Consequently, the data volume of the encoded image can be simplified to $[eqn]$ . It is worth noting that the selection of C involves a fundamental trade-off: a smaller C enhances communication efficiency by reducing the required transmission time and energy, but it may also limit the restoration of fine-grained data features. In this study, C is configured to balance the transmission overhead with the quality of semantic reconstruction, ensuring the feasibility of real-time tasks in resource-constrained UAV environments.

In the communication link between the UAV and the user, let $[eqn]$ denote the channel bandwidth allocated by the UAV to the k-user in time slot m. Neglecting the overhead of link control signaling and the impact of channel error retransmission, the transmission time $[eqn]$ of the compressed image can be expressed as

[eqn]

We define a function, if the $[eqn]$ between the UAV and k-user is satisfied and the function value of user k in time slot m is set to 1 (indicating the user is successfully served); otherwise, it is set to 0.

[eqn]

where $[eqn]$ represents the minimum PSNR requirement for user k and $[eqn]$ represents the PSNR between the UAV and k-user.

To ensure that the UAV provides as fair QoS as possible to each user throughout the entire flight procedure and avoid situations where individual users are continuously underserved, we define the service fairness constraint $[eqn]$ as the core metric for quantifying fairness at time slot m as follows:

[eqn]

The more fair the service each user receives, the closer the coefficient approaches 1.

Our optimization objective is to minimize the combined cost of task completion time and energy consumption while ensuring fair QoS provision across all users and that the UAV returns back to the take-off point. Let $[eqn]$ denote the UAV’s trajectory over M time slots, where $[eqn]$ represents the 2D position at time slot m. Let $[eqn]$ denote the bandwidth allocation strategy. The joint optimization problem is formulated as

[eqn]

[eqn]

[eqn]

[eqn]

[eqn]

[eqn]

[eqn]

The optimization variables $[eqn]$ and $[eqn]$ collectively determine service coverage, energy efficiency, and fairness performance, with their coupling effects directly influencing the achievable energy-delay product (EDP) under the given constraints.

3. Methodology

In the context of this framework, we consider the action space of the UAVs, including acceleration, pitch angle, and yaw angle. These actions will affect the flight trajectory of the UAVs, consequently impacting the PSNR quality provided by the UAVs to users, and influence the system’s environmental states. Furthermore, the previous states and actions collectively drive the agents to transition into new random states, eliciting instantaneous rewards. Consequently, the optimization problem is formulated as a Markov Decision Process (MDP) denoted by $[eqn]$ , where each UAV is considered as an agent. This agent interacts with a dynamic environment defined by a series of states $[eqn]$ and a series of actions $[eqn]$ to maximize long-term rewards $[eqn]$ .

3.1. State Space

The state space should contain all relevant information about the environment. Therefore, we express the state of the UAV as consisting of

[eqn]

where $[eqn]$ represents the user’s required PSNR, $[eqn]$ denotes the position of the k-th user, $[eqn]$ represents the difference between the PSNR provided by the UAV and the PSNR required by the k-th user, $[eqn]$ denotes the cumulative number of completed communication tasks for the k-user, and $[eqn]$ denotes the remaining energy. Thus, the dimension of the state space is $[eqn]$ . The transition of the remaining energy $[eqn]$ is governed by the UAV’s power consumption. Specifically, the state update for energy is defined as $[eqn]$ , where $[eqn]$ is the total power consumption at time slot m, as formulated in Equations (17) and (18). This ensures that the agent can perceive its energy status and adjust its trajectory and resource allocation actions accordingly.

3.2. Action Space

The action space encompasses all possible measures that the UAV can take during the task. Therefore, we express the action of the UAV as consisting of

[eqn]

where $[eqn]$ denotes the normalized acceleration in time slot m, $[eqn]$ denotes the normalized yaw angle in time slot m, $[eqn]$ denotes the normalized pitch angle in time slot m, and $[eqn]$ denotes the allocated bandwidth ratio in time slot m. To reduce the dimensionality of the action space, we employ normalized values, which helps eliminate significant differences among features and promotes convergence. The normalized action can be represented as follows:

[eqn]

[eqn]

[eqn]

where $[eqn]$ represents the upper bound of acceleration, and $[eqn]$ , $[eqn]$ denote the actual values. Clearly, each vector value falls within a distinct range, specifically $[eqn]$ , $[eqn]$ and $[eqn]$ .

3.3. Reward

To ensure that the UAV successfully accomplishes its designated flight task while maintaining a logical closed-loop between energy observation and optimization, we have meticulously designed a comprehensive reward function. The agent receives an instantaneous reward r at each time slot, which guides the optimization of trajectory and resource allocation.

[eqn]

[eqn]

[eqn]

[eqn]

To address the issue of sparse rewards, (27a) implements a progressive reward mechanism where the system accumulates gains as each user’s PSNR requirement is satisfied. Crucially, (27b) serves as the core efficiency reward that directly addresses the energy consumption observed in the state space. Since the remaining energy $[eqn]$ in (24) transitions according to $[eqn]$ , the inclusion of $[eqn]$ in (27b) penalizes excessive energy expenditure resulting from high-speed maneuvers or intensive bandwidth allocation. This mechanism ensures that the agent perceives the “cost” of its actions on the energy state. Furthermore, (27c) enforces operational boundaries by imposing penalty terms when the UAV violates designated task areas, and (27d) provides a potential-based guidance to encourage the UAV to return toward the take-off point. The coefficients $[eqn]$ to $[eqn]$ represent the weights for each component. Finally, the total reward is expressed as

[eqn]

3.4. PPO-Based Trajectory Optimization Algorithm

The procedural flow of the PPO-based algorithm is depicted in Figure 2, where the actor and critic networks jointly interact with the environment to derive the optimal computation offloading strategy. To reduce storage overhead, the proposed algorithm dispenses with a replay buffer and instead adopts an online training paradigm, in which learning is performed concurrently with execution. Details of the PPO-based trajectory optimization algorithm are shown in Algorithm 2. Algorithm 2 PPO-based UAV Trajectory Optimization Algorithm for Semantic CommunicationsRequire: Number of episodes $[eqn]$ , actor update steps $[eqn]$ , critic update steps $[eqn]$ Ensure: Optimized policy $[eqn]$

1:Initialize actor parameters $[eqn]$ and critic parameters $[eqn]$
2:for $[eqn]$ to $[eqn]$ do
3: Collect trajectory $[eqn]$ using current policy $[eqn]$
4: Set $[eqn]$
5: for each state $[eqn]$ do
6: Calculate advantage function $[eqn]$
7: end for
8: for $[eqn]$ to $[eqn]$ do ▹ Actor update steps
9: Calculate sampling ratio $[eqn]$
10: Calculate actor loss $[eqn]$ using clipping function
11: Update actor parameters $[eqn]$
12: Update policy: $[eqn]$
13: end for
14: for $[eqn]$ to $[eqn]$ do ▹ Critic update steps
15: Calculate critic loss $[eqn]$
16: Calculate gradient $[eqn]$
17: Update critic parameters $[eqn]$
18: end for
19:end for
20:return Optimized policy $[eqn]$

3.5. Computational Complexity Analysis

To evaluate the feasibility of Algorithm 1 for real-time UAV operations, we analyze its computational complexity. Let T denote the number of time slots per episode, while $[eqn]$ and $[eqn]$ represent the update steps for the actor and critic networks, respectively. The overall time complexity per episode is $[eqn]$ , where $[eqn]$ and $[eqn]$ are the computational weights of the actor and critic networks. In our implementation, both networks utilize a lightweight multi-layer perceptron (MLP) architecture with only two hidden layers. Such a streamlined structure ensures that the inference and training processes remain computationally tractable, allowing the UAV to perform trajectory planning and resource allocation within strict latency and energy constraints.

4. Numerical Results

In this section, we verify the feasibility and advantages of introducing federated learning into semantic communication. Subsequently, we validate the performance advantages of the reinforcement learning algorithm based on PPO in the joint optimization problem through simulations.

We consider that the users are randomly distributed in the targeted area, the targeted area $[eqn]$ is set to ( $[eqn]$ , $[eqn]$ , $[eqn]$ ), and the flight altitude of the UAV is fixed at $[eqn]$ . We consider that the user number in this area is 25 and their positions are randomly generated within this area. The UAV take-off point is set to ( $[eqn]$ , $[eqn]$ , $[eqn]$ ).

In the training stage, for the JSCC model, we select the CIFAR-10 dataset as the communication task and training data, with the compression ratio C set to 0.167. Due to constraints in hardware facilities and computational resources, three FL client nodes are configured in our experiments. This setup is conducive to ensuring stable algorithm convergence. Furthermore, three FL terminals are sufficient to provide a fundamental validation of the methodology’s feasibility and the effectiveness of the proposed joint optimization framework. Specifically, we set the global accuracy $[eqn]$ to 0.1 and the local accuracy $[eqn]$ to 0.01, while the learning rate is configured as $[eqn]$ . For UAV parameters, the rotor blade tip speed $[eqn]$ is set to $[eqn]$ ; the mean rotor induced velocity in hover $[eqn]$ is $[eqn]$ ; the fuselage drag ratio $[eqn]$ is $[eqn]$ ; the rotor solidity s is $[eqn]$ ; the air density $[eqn]$ is $[eqn]$ ; the induced power in hovering status $[eqn]$ is $[eqn]$ ; the transmit power P is $[eqn]$ ; the blade profile power $[eqn]$ is $[eqn]$ ; the rotor disk area G is $[eqn]$ ; the effective switched capacitance $[eqn]$ is $[eqn]$ ; the number of CPU cycles per sample C is $[eqn]$ ; the on board CPU frequency F is $[eqn]$ ; the maximum horizontal velocity $[eqn]$ is $[eqn]$ ; the maximum acceleration $[eqn]$ is $[eqn]$ ; the maximum energy $[eqn]$ is 358,200 J; and the maximum bandwidth $[eqn]$ is $[eqn]$ . For environmental parameters, the carrier frequency $[eqn]$ is $[eqn]$ ; the noise density $[eqn]$ is $[eqn]$ ; the line-of-sight efficiency factor $[eqn]$ is 1; and the time slot is $[eqn]$ .

In the optimize stage, we demonstrate the effectiveness of the proposed approach based on PPO in the reinforcement learning task through experiments. The experimental system is implemented using Python and PyTorch. For the actor and critic networks, we employ two fully connected hidden layers with 256 neurons each. The actor network is trained with a learning rate of $[eqn]$ , and the critic network is trained with a learning rate of $[eqn]$ . Adam optimizer is used to update the actor and critic networks. The reward weights are as follows: $[eqn]$ is 5, $[eqn]$ is 0.1, $[eqn]$ is 10, $[eqn]$ is 100. The reward weights $[eqn]$ to $[eqn]$ are treated as hyperparameters and are fine-tuned to ensure the stability of the training process and the convergence of the PPO algorithm. The selection of these values accounts for the relative scales of the different reward components and is informed by existing empirical studies in UAV-assisted communication [17], ensuring a balanced priority between service fairness and energy conservation.

After conducting multiple independent runs and performing statistical calculations, we obtained the following results.

According to Figure 3a,b, we can easily find that the trained UAV is observed to dynamically adjust its flight angle and acceleration in real time. Specifically, the UAV maintains its maximum speed to enhance task efficiency when the planned trajectory is relatively smooth; during turning maneuvers, however, it moderately reduces speed to ensure a stable transition.

To quantitatively evaluate the individual contributions of the FL framework and the PPO algorithm to the system performance, we conducted a comprehensive ablation study.

The training epoch requirement comparison between the FL-based training strategy and the traditional baseline method across varying PSNR values is shown in Figure 4a. As the figure shows, experimental results demonstrate that the FL-based method exhibits significant advantages in model training efficiency.

Figure 4b presents the fairness coefficients of the three methods under the scenarios of random and clustered user distributions: the (1) proposed method, which flexibly assigns bandwidth to each user based on actual needs; (2) the equal method, which distributes bandwidth evenly to all users per time slot; and the (3) single method, where only one user receives bandwidth resources per time slot.

Figure 4c presents the variation trend of the EDP over time, which shows a gradual upward trend with as time elapses, and the proposed method in this paper exhibits significant advantages: its EDP is consistently lower than that of other bandwidth allocation methods. The core reason for this advantage lies in the fact that the proposed method achieves precise resource trade-off through a dynamic bandwidth allocation strategy: for users at a further distance, a smaller bandwidth is allocated to meet their PSNR requirements; for users at a shorter distance, a larger bandwidth is allocated to reduce the service completion time. This differentiated bandwidth allocation mechanism enables services to be completed with lower energy consumption and in a shorter time. Eventually, the proposed method achieves the maximum average number of services completed. This performance advantage persists across different user densities, verifying the method’s robustness and practical value for UAV semantic communication systems.

5. Conclusions

To address the SWAP constraints in UAV-assisted systems, this paper proposes a training framework integrating FL and semantic communication. Simulations verify that this method can effectively reduce training resource overhead. On this basis, by formulating a multi-objective optimization problem and combining MDP modeling with the PPO algorithm for trajectory planning and dynamic bandwidth allocation, simulations also confirm its advantages in optimizing resource consumption. Moreover, the kinematically feasible trajectories generated by PPO outperform benchmark algorithms in both EDP and fairness.

While this study validates the effectiveness of the proposed FL-SC-PPO framework under a simplified system model, we acknowledge that real-world deployment entails more complex challenges. Consequently, our future work will extend this methodology to sophisticated environments by incorporating factors such as user mobility, varying UAV altitudes, and co-channel interference. Furthermore, while the current single-agent PPO-based framework demonstrates superior energy efficiency, exploring multi-agent reinforcement learning (MARL) and hybrid optimization methods remains a promising avenue for enhancing system scalability. These advanced architectures will be investigated in subsequent research to address more complex, coordinated multi-UAV semantic communication scenarios.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Luo X. Chen H.H. Guo Q. Semantic Communications: Overview, Open Issues, and Future Research Directions IEEE Wirel. Commun.20222921021910.1109/MWC.101.2100269 · doi ↗
2Carnap R. Bar-Hillel Y. An Outline of a Theory of Semantic Information RLE Technical Report 247Research Laboratory of Electronics, Massachusetts Institute of Technology Cambridge, MA, USA 1952
3Farsad N. Rao M. Goldsmith A. Deep Learning for Joint Source-Channel Coding of Text Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Calgary, AB, Canada 15–20 April 20182326233010.1109/ICASSP.2018.8461983 · doi ↗
4Bourtsoulatze E. Burth Kurka D. Gündüz D. Deep Joint Source-Channel Coding for Wireless Image Transmission IEEE Trans. Cogn. Commun. Netw.2019556757910.1109/TCCN.2019.2919300 · doi ↗
5Weng Z. Qin Z. Li G.Y. Semantic Communications for Speech Signals Proceedings of the ICC 2021—IEEE International Conference on Communications Montreal, QC, Canada 14–23 June 20211610.1109/ICC 42927.2021.9500590 · doi ↗
6Weng Z. Qin Z. Tao X. Pan C. Liu G. Li G.Y. Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis IEEE Trans. Wirel. Commun.2023226227624010.1109/TWC.2023.3240969 · doi ↗
7Xie H. Qin Z. Li G.Y. Juang B.-H. Deep Learning Based Semantic Communications: An Initial Investigation Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference Taipei, Taiwan 7–11 December 20201610.1109/GLOBECOM 42002.2020.9322296 · doi ↗
8Xie H. Qin Z. Li G.Y. Deep Learning Enabled Semantic Communication Systems IEEE Trans. Signal Process.2021692663267510.1109/TSP.2021.3071210 · doi ↗