# Constrained Soft Actor–Critic for Joint Computation Offloading and Resource Allocation in UAV-Assisted Edge Computing

**Authors:** Nawazish Muhammad Alvi, Waqas Muhammad Alvi, Xiaolong Zhou, Jun Li, Yifei Wei

PMC · DOI: 10.3390/s26041149 · Sensors (Basel, Switzerland) · 2026-02-10

## TL;DR

This paper introduces a new deep reinforcement learning method to optimize computation offloading and resource allocation in UAV-assisted edge computing systems.

## Contribution

The novel contribution is Constrained Soft Actor–Critic (C-SAC), a deep reinforcement learning algorithm that explicitly handles latency constraints using a CMDP formulation.

## Key findings

- C-SAC achieves an 18.9% constraint violation rate, significantly better than other methods.
- Learned policies adaptively adjust computation based on channel quality with a strong correlation coefficient of −0.894.
- C-SAC remains robust with minimal performance variation even when channel variability triples.

## Abstract

Unmanned Aerial Vehicle (UAV)-assisted edge computing supports latency-sensitive applications by offloading computational tasks to ground-based servers. However, determining optimal resource allocation under strict latency constraints and stochastic channel conditions remains challenging. This paper addresses the joint computation partitioning and power allocation problem for UAV-assisted edge computing systems. We formulate the problem as a Constrained Markov Decision Process (CMDP) that explicitly models latency constraints, rather than relying on implicit reward shaping. To solve this CMDP, we propose Constrained Soft Actor–Critic (C-SAC), a deep reinforcement learning algorithm that combines maximum-entropy policy optimization with Lagrangian dual methods. C-SAC employs a dedicated constraint critic network to estimate long-term constraint violations and an adaptive Lagrange multiplier that automatically balances energy efficiency against latency satisfaction without manual tuning. Extensive experiments demonstrate that C-SAC achieves an 18.9% constraint violation rate. This represents a 60.6-percentage-point improvement compared to unconstrained Soft Actor–Critic, with 79.5%, and a 22.4-percentage-point improvement over deterministic TD3-Lagrangian, achieving 41.3%. The learned policies exhibit strong channel-adaptive behavior with a correlation coefficient of −0.894 between the local computation ratio and channel quality, despite the absence of explicit channel modeling in the reward function. Ablation studies confirm that both adaptive mechanisms are essential, while sensitivity analyses show that C-SAC maintains robust performance with violation rates varying by less than 2 percentage points even as channel variability triples. These results establish constrained reinforcement learning as an effective approach for reliable UAV edge computing under stringent quality-of-service requirements.

## Full-text entities

- **Diseases:** C-SAC (MESH:D016638), injury to (MESH:D014947), CMDP (MESH:D020195)
- **Chemicals:** Actor (-), C (MESH:D002244)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12944609/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944609/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12944609/full.md

---
Source: https://tomesphere.com/paper/PMC12944609