# RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration

**Authors:** Hefei Hu, Zining Liu, Fan Wu

PMC · DOI: 10.3390/s26010242 · Sensors (Basel, Switzerland) · 2025-12-30

## TL;DR

This paper introduces RL-PMO, a reinforcement learning algorithm that efficiently migrates multiple network functions in edge networks during hardware failures.

## Contribution

The novel contribution is a two-stage offline reinforcement learning algorithm using Decision Mamba for parallel SFC migration optimization.

## Key findings

- RL-PMO achieves a 95% migration success rate across varying load conditions.
- It outperforms typical offline RL algorithms by 13-17% under low to high loads.
- The method effectively handles distribution shift and Q-value overestimation using CQL regularization and a twin-critic architecture.

## Abstract

In edge networks, hardware failures and resource pressure may disrupt Service Function Chains (SFCs) deployed on the failed node, making it necessary to efficiently migrate multiple Virtual Network Functions (VNFs) under limited resources. To address these challenges, this paper proposes an offline reinforcement learning-based parallel migration optimization algorithm (RL-PMO) to enable parallel migration of multiple VNFs. The method follows a two-stage framework: in the first stage, improved heuristic algorithms are used to generate high-quality migration trajectories and construct a multi-scenario dataset; in the second stage, the Decision Mamba model is employed to train the policy network. With its selective modeling capability for structured sequences, Decision Mamba can capture the dependencies between VNFs and underlying resources. Combined with a twin-critic architecture and CQL regularization, the model effectively mitigates distribution shift and Q-value overestimation. The simulation results show that RL-PMO maintains approximately a 95% migration success rate across different load conditions and improves performance by about 13% under low and medium loads and up to 17% under high loads compared with typical offline RL algorithms such as IQL. Overall, RL-PMO provides an efficient, reliable, and resource-aware solution for SFC migration in node failure scenarios.

## Full-text entities

- **Diseases:** SFC (MESH:D003291), BC (MESH:D001523), injury to (MESH:D014947), DM (MESH:D020195)
- **Chemicals:** FT (MESH:D005641), IQL (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12788342/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12788342/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12788342/full.md

---
Source: https://tomesphere.com/paper/PMC12788342