# SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks

**Authors:** Hilary Kelechi Anabi, Samuel Frimpong, Sanjay Madria

PMC · DOI: 10.3390/s26020730 · Sensors (Basel, Switzerland) · 2026-01-21

## TL;DR

This paper introduces a new deep reinforcement learning framework to improve energy efficiency in underground mining networks after disasters.

## Contribution

The novel SADQN framework enhances energy beamforming by incorporating residual energy awareness and dual-variable updates for constraint handling.

## Key findings

- SADQN increases cumulative harvested energy by 11% over DQN and 40% over PSO.
- The framework achieves fairness indices above 0.90 and converges 27% faster than Safe-DQN.
- SADQN shows 33% lower performance variance than Safe-DQN, ensuring stability in disaster scenarios.

## Abstract

The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12846087/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12846087/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12846087/full.md

---
Source: https://tomesphere.com/paper/PMC12846087