# Lightweight Reinforcement Learning for Priority-Aware Spectrum Management in Vehicular IoT Networks

**Authors:** Adeel Iqbal, Ali Nauman, Tahir Khurshaid

PMC · DOI: 10.3390/s25216777 · Sensors (Basel, Switzerland) · 2025-11-05

## TL;DR

This paper introduces a lightweight reinforcement learning framework for managing spectrum resources in vehicular IoT networks, improving efficiency and reliability while maintaining low computational complexity.

## Contribution

The paper introduces two enhanced Q-Learning variants (VPADQ-C and Q-UCB) and a Risk-Aware Heuristic baseline for priority-aware spectrum management in V-IoT networks.

## Key findings

- VPADQ-C achieves high energy efficiency (≈8.425×107 bits/J) and reduces interruption probability by over 60%.
- Q-UCB converges quickly (≈190 episodes), with low blocking probability (≈0.0135) and mean delay (≈0.351 ms).
- Both methods maintain fairness near 0.364 and throughput around 28 Mbps with low computational complexity.

## Abstract

The Vehicular Internet of Things (V-IoT) has emerged as a cornerstone of next-generation intelligent transportation systems (ITSs), enabling applications ranging from safety-critical collision avoidance and cooperative awareness to infotainment and fleet management. These heterogeneous services impose stringent quality-of-service (QoS) demands for latency, reliability, and fairness while competing for limited and dynamically varying spectrum resources. Conventional schedulers, such as round-robin or static priority queues, lack adaptability, whereas deep reinforcement learning (DRL) solutions, though powerful, remain computationally intensive and unsuitable for real-time roadside unit (RSU) deployment. This paper proposes a lightweight and interpretable reinforcement learning (RL)-based spectrum management framework for Vehicular Internet of Things (V-IoT) networks. Two enhanced Q-Learning variants are introduced: a Value-Prioritized Action Double Q-Learning with Constraints (VPADQ-C) algorithm that enforces reliability and blocking constraints through a Constrained Markov Decision Process (CMDP) with online primal–dual optimization, and a contextual Q-Learning with Upper Confidence Bound (Q-UCB) method that integrates uncertainty-aware exploration and a Success-Rate Prior (SRP) to accelerate convergence. A Risk-Aware Heuristic baseline is also designed as a transparent, low-complexity benchmark to illustrate the interpretability–performance trade-off between rule-based and learning-driven approaches. A comprehensive simulation framework incorporating heterogeneous traffic classes, physical-layer fading, and energy-consumption dynamics is developed to evaluate throughput, delay, blocking probability, fairness, and energy efficiency. The results demonstrate that the proposed methods consistently outperform conventional Q-Learning and Double Q-Learning methods. VPADQ-C achieves the highest energy efficiency (≈8.425×107 bits/J) and reduces interruption probability by over 60%, while Q-UCB achieves the fastest convergence (within ≈190 episodes), lowest blocking probability (≈0.0135), and lowest mean delay (≈0.351 ms). Both schemes maintain fairness near 0.364, preserve throughput around 28 Mbps, and exhibit sublinear training-time scaling with O(1) per-update complexity and O(N2) overall runtime growth. Scalability analysis confirms that the proposed frameworks sustain URLLC-grade latency (<0.2 ms) and reliability under dense vehicular loads, validating their suitability for real-time, large-scale V-IoT deployments.

## Full-text entities

- **Diseases:** Vanilla Q-Learning (MESH:D007859), injury to (MESH:D014947), V-IoT (MESH:C000719207)
- **Chemicals:** Q (MESH:D005973), DoubleQ (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12610697/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12610697/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12610697/full.md

---
Source: https://tomesphere.com/paper/PMC12610697