Intelligent Agent for Resource Allocation from Mobile Infrastructure to Vehicles in Dynamic Environments Scalable on Demand

Renato Cumbal; Berenice Arguero; Germán V. Arévalo; Roberto Hincapié; Christian Tipantuña

PMC · DOI:10.3390/s26020508·January 12, 2026

Intelligent Agent for Resource Allocation from Mobile Infrastructure to Vehicles in Dynamic Environments Scalable on Demand

Renato Cumbal, Berenice Arguero, Germán V. Arévalo, Roberto Hincapié, Christian Tipantuña

PDF

Open Access

TL;DR

This paper introduces an intelligent system for efficiently allocating communication resources between vehicles and infrastructure in urban environments.

Contribution

The novel framework combines macroscopic mobility analysis, ILP for RSU placement, and Q-learning for dynamic resource allocation in V2I systems.

Findings

01

The ILP model activates only 2.9% of RSUs while ensuring over 90% vehicular coverage.

02

The SGNC manages 10 antennas and 120 resources efficiently, maintaining performance beyond 70% capacity.

03

The proposed method outperforms static allocation in resource efficiency and coverage consistency under varying traffic.

Abstract

This work addresses the increasing complexity of urban mobility by proposing an intelligent optimization and resource-allocation framework for Vehicle-to-Infrastructure (V2I) communications. The model integrates a macroscopic mobility analysis, an Integer Linear Programming (ILP) formulation for optimal Road-Side Unit (RSU) placement, and a Smart Generic Network Controller (SGNC) based on Q-learning for dynamic radio-resource allocation. Simulation results in a realistic georeferenced urban scenario with 380 candidate sites show that the ILP model activates only 2.9% of RSUs while guaranteeing more than 90% vehicular coverage. The reinforcement-learning-based SGNC achieves stable allocation behavior, successfully managing 10 antennas and 120 total resources, and maintaining efficient operation when the system exceeds 70% capacity by reallocating resources dynamically through the λ-based…

Figures13

Click any figure to enlarge with its caption.

Keywords

AIdynamic coverageQ-learningresource allocationsmart generic network controllerVANETV2I

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicular Ad Hoc Networks (VANETs) · Traffic control and management · Age of Information Optimization

Full text

1. Introduction

Intelligent transportation systems (ITSs) have become essential for addressing the rapid growth of vehicular mobility worldwide. More than one billion vehicles currently circulate globally, and this number is expected to increase substantially in emerging regions such as China and Latin America [1,2]. In particular, vehicle ownership is projected to reach approximately 250 million in China, while Latin America is expected to grow to around 200 million vehicles by 2050 [3]. This sustained increase poses major challenges in traffic congestion, road safety, and environmental sustainability, and motivates the integration of advanced ITS technologies for more adaptive and efficient traffic management through connected and intelligent vehicles [2].

Modern intelligent vehicles operate within highly dynamic communication networks that generate and consume large volumes of data in real time. These data can be exploited using machine learning techniques to enable proactive traffic control strategies such as congestion forecasting, dynamic route recommendations, and adaptive traffic signal control. However, effectively deploying and operating these solutions remains challenging due to limitations in network planning, particularly in radio resource allocation and infrastructure adaptation to time-varying urban mobility demands. This gap highlights the need for intelligent and scalable approaches that ensure reliable V2I communication while adapting to changing traffic conditions [4,5].

Motivated by this context, this paper proposes an integrated framework for VANET infrastructure planning and dynamic radio resource management in georeferenced urban environments. First, macroscopic mobility modeling is applied to simulated scenarios to derive spatiotemporal vehicular demand and congestion patterns. Then, an ILP-based optimization model dynamically selects RSU deployment sites while considering coverage, capacity, and demand constraints. Finally, a Smart Generic Network Controller (SGNC) leverages reinforcement learning to adapt radio resource allocation over time, improving communication efficiency under fluctuating traffic loads.

The main contributions of this work are: (i) the development of a scalable and realistic VANET planning model that jointly considers coverage, capacity, and spatiotemporal demand derived from mobility patterns; and (ii) the design and implementation of a generic reinforcement-learning-based controller that dynamically allocates radio resources under infrastructure and channel constraints, supported by a comprehensive performance evaluation across diverse urban scenarios. These contributions advance the state of the art in vehicular communication systems and provide practical insights for deploying intelligent transportation networks in smart cities as explained in Figure 1.

The remainder of this paper is organized as follows. Section 2 discusses previous studies on this research approach. Section 3 formally presents the problem statement, optimization model variables, and the proposed SGNC. Section 4 presents the results obtained and their corresponding analysis. Finally, Section 5 presents the conclusions of this study, underlining the importance of the study’s findings.

2. Related Work

2.1. Mobility Modeling and Traffic Simulation

In vehicular networks, each vehicle can be considered a mobile node whose movement follows predefined road structures, making its trajectory partially predictable. Modern vehicles integrate computing, communication, sensing, and navigation systems developed collaboratively by industry, academia, and governmental institutions. These systems enable the creation of prototypes and standards that enhance communication performance, which is typically evaluated through simulation due to the high cost of real-world deployment [6,7].

Traffic simulators generate realistic geospatial environments—including roads, buildings, and natural features—and are essential for evaluating VANET performance at both macroscopic and microscopic levels. Tools such as OpenStreetMap (OSM) are widely used to define realistic planning areas for mobility and communication analysis. Mobility simulations can be classified into random models, flow-based models, behavioral models, and trace-based models [8,9] to determine the planning area for both the mobility and communication networks, thereby creating an ideal scenario for analysis. Mobility simulators are divided into several models: random, flow, traffic, behavioral, and trace.

In vehicular traffic, various macroscopic models have been studied, focusing on the global analysis of traffic to calculate parameters such as road capacity, vehicle distribution, and traffic density [10,11].Microscopic models, on the other hand, analyze the movement of each vehicle individually, considering its physical capabilities [12,13,14,15,16,17,18,19]. Although most studies have focused on microscopic aspects, this can lead to unrealistic models due to the complexity of these scenarios. By providing a global view of the system, macroscopic models are easier to manage and require fewer computational resources, unlike microscopic models, which are more detailed but require greater processing power [20].

Until now, researchers have primarily focused on microscopic aspects, overlooking macroscopic ones. There are several realistic mobility models based on simulated traces, real-world maps, and integration of existing models. Real-world traces are obtained using devices such as GPS, while artificial traces and traffic are modeled using software [7,20]. On the other hand, the simulator of urban mobility (SUMO), widely recognized and used in the literature, is a reliable and trusted tool. It uses the Krauss model to calculate future vehicle speeds, becoming a flow simulator based on a microscopic approach. Recently, Machine Learning (ML) methods have been incorporated to improve the realism of simulations, allowing for more accurate modeling of movement patterns in complex urban and road environments.

2.2. Congestion Analysis and ML-Based Traffic Prediction

Traffic congestion is a problem that affects both drivers and passengers, increasing travel times and generating economic, environmental, and health losses. Furthermore, it negatively impacts communication in vehicular networks, creating congestion at the data level. This phenomenon has been studied since 2010 [21], and more precise methods for detecting recurring and non-recurring congestion are being developed, allowing events to be monitored in real-time. In this sense, statistical inference and ML algorithms play a crucial role, as they allow for obtaining relevant parameters and applying proactive strategies to improve traffic conditions [22].

2.3. VANET Communication Models: V2V, V2I, and ITS Integration

VANETs, on the other hand, are equipped with computing modules and wireless devices that allow them to integrate into an ITS, meeting strict requirements for low latency and high transfer speeds [22]. Their role in expanding connected and autonomous vehicles is significant, V2V and V2I communications are transforming the transportation system and the road environment. V2V communications enable vehicles to communicate directly with each other or through multi-hop schemes, thereby reducing the need for numerous RSUs and enhancing coverage without compromising network performance. V2I communications connect vehicles with fixed units within the road infrastructure, i.e., with RSUs. Integrating V2V and V2I technologies with ITSs optimizes network efficiency and enhances safety, instilling confidence in the effectiveness of these technologies [23,24,25,26,27,28,29,30,31,32,33,34,35].

In scenarios with few RSUs, optimizing their distribution is not a priority. However, in larger areas, it is essential to carefully plan their deployment to maximize coverage without excessively increasing installation costs. This balance between coverage and cost requires strategic planning to ensure the network remains efficient and functional [36]. To optimize RSU placement, various solutions have been proposed, including techniques such as linear programming and multi-objective evolutionary algorithms. These methodologies aim to improve network coverage while minimizing the number of RSUs required, contributing to resource savings [37,38]. Spatiotemporal coverage strategies based on the representation of vehicle mobility patterns have also been proposed. In this scenario, optimal RSU locations are calculated to cover these patterns with the fewest possible units, improving efficiency compared to previous studies [39]. However, scalability remains a significant challenge, especially in large networks.

2.4. RSU Deployment and Infrastructure Optimization

The efficient design and deployment of VANET infrastructure ensures reliable communications in vehicular networks. The strategic placement of RSUs and resource optimization are crucial for improving coverage, reducing costs, and maximizing network performance in high-mobility environments. This involves variability in vehicle density depending on the location (urban or peripheral areas) and the time of day (peak or off-peak), and requires flexible and robust resource management schemes. These schemes must adapt to such variations and ensure the efficient use of available resources because vehicles function as network nodes and information sources [40]. VANET communication standards have been developed globally, such as DSRC in the US and ITS-G5 in Europe, based on IEEE 802.11p [20] technology. However, these standards are not without their challenges. Issues such as channel access delays, lack of quality of service, and short-lived connections between vehicles and infrastructure are significant hurdles that need to be overcome. To address these challenges and take advantage of the high penetration of cellular networks, the 3GPP project has started investigating the use of LTE for vehicle to everything (V2X) services, which offer a promising alternative with advantages over IEEE 802.11p, such as increased mobility, congestion resilience, higher multiplexing capacity, and better coverage [40]. Despite these advantages, technical challenges remain, especially in designing efficient resource allocation schemes that meet bandwidth, power consumption, and energy efficiency requirements. As V2I data traffic is expected to increase rapidly in the coming years, the need for efficient solutions to these problems becomes even more urgent. Most current studies approach the problem from a static perspective without considering dynamic changes in the radio channel, which limits the adaptive capacity of the algorithms [41].

2.5. Resource Allocation in Vehicular Networks

Recently, there has been a significant increase in studies proposing prediction-based protocols to improve the efficiency and reliability of vehicular networks. These advances, driven by artificial intelligence, ML, and particularly in deep learning, have achieved remarkable progress in image classification, video games, and robotics. These technologies enable the development of intelligent systems that operate in complex environments, analyzing large volumes of data to identify patterns relevant to vehicular network problems, thereby helping networks make more informed decisions [42]. However, adapting these tools to the particular characteristics of vehicular networks, such as their high mobility, remains a significant challenge and a promising avenue for future research.

Flexible and scalable resource allocation strategies are essential in vehicular networks to ensure service quality and meet user needs. However, many vehicular networks still rely on fixed allocations, failing to account for variations in traffic and coverage [43]. Vehicles’ high mobility leads to constant changes in network topology and the number of users within the range of RSUs, making efficient resource allocation a challenge. As the number of connected vehicles increases, the demand for spectrum exceeds availability, underscoring the critical need for improved resource allocation strategies [44]. Some researchers have developed optimization models to maximize user utility, considering traffic variability and energy efficiency. A key strategy in this research has been the use of distributed algorithms, which have significantly reduced computational complexity. These algorithms, implemented using techniques such as augmented Lagrangian, have proven effective in achieving near-optimal solutions in large-scale networks. Other studies have applied game theory to model resource allocation as a noncooperative congestion game, seeking fairness in spectrum distribution. Models based on semi-Markov decision processes and hierarchical schemes, as well as those based on the Nash bargaining game, have also been explored. Despite these advances, efficiently increasing system capacity remains challenging, especially in V2I communications. Dynamic resource allocation becomes even more complicated when past decisions influence future ones, because vehicular environments are highly variable and limited available resources. Therefore, heuristic and iterative methods have been proposed to maximize V2I communication capacity, considering interference and applying advanced optimization algorithms [45].

2.6. Reinforcement Learning for Dynamic Resource Management

The exponential growth of data traffic in the coming years demands more efficient solutions, such as dynamic spectrum access or the formation of vehicle clusters to share resources. These solutions have proven effective, although they still have limitations. Reinforcement learning (RL) has emerged as a key tool for optimizing resource allocation in vehicular networks. Its adaptability in handling dynamic environments, where complete information is not always available, makes it a promising solution [36,46,47,48]. This technology can reduce congestion and improve data delivery efficiency, giving us confidence in its potential [37,38,49,50].

The limited bandwidth of VANET communications, combined with high packet loss rates and short contact windows between vehicles, restricts the amount of data that can be transferred. This condition shows the need for more efficient spectrum management and real-time network monitoring to detect and correct inefficient bandwidth usage, thereby improving system performance [38]. In this sense, RL emerges as an innovative alternative to traditional mathematical models, allowing vehicular networks to adapt to rapid changes and optimize resource allocation. Unlike centralized approaches, which can minimize packet collisions, decentralized models are distinguished by their simplicity and fast learning. As artificial intelligence advances, its integration into vehicular networks promises to transform resource management, providing more reliable and efficient communications.

2.7. Summary and Motivation

Table 1 summarizes representative studies, highlighting their main limitations. Overall, the literature reveals that: (1) RSU deployment, mobility modeling, and resource allocation are often addressed separately; (2) dynamic variations in demand and channel conditions are rarely integrated into a unified framework; and (3) RL approaches lack scalability and real-world mobility realism.

This motivates the development of a scalable RL-based intelligent agent capable of adapting resource allocation dynamically under realistic mobility and infrastructure constraints.

3. Methodology

RSU planning requires a methodological model that combines traffic modeling and optimization strategies to ensure reliable coverage, efficient resource allocation, and service continuity in urban areas. Vehicular mobility introduces fluctuations in traffic density and connectivity, which complicate static planning. Furthermore, the growing demand for V2I services requires adaptive mechanisms for resource allocation and network optimization. To address this challenge, a hybrid simulation model has been created that combines microscopic and macroscopic traffic analysis, generating scenarios based on the existing road infrastructure and modeling vehicular flow according to road characteristics. Figure 2 illustrates the flowchart outlining the processes, from geospatial data conversion to simulation execution in SUMO.

The modeling process in Figure 2 follows several stages. First, geographic data containing structured road network information is obtained from OSM for a selected 1.9 km × 1.3 km area of a dense urban environment. This data is organized into a processable format and integrated into the SUMO traffic simulator. The simulation analyzes traffic patterns, including vehicle density, average speed, and intersection behavior. From this data, the demand for V2I infrastructure is estimated, and RSU placement is optimized, ensuring efficient deployment in urban environments with high mobility. The algorithm flow begins with the acquisition of geospatial data, followed by the generation of vehicle mobility using the randomTrips.py script, a key step in the process. Finally, the sumocfg configuration file integrates the components necessary to run and visualize the simulation.

In the initial simulation phase, random vehicle trajectories are generated from the converted map, taking into account key parameters such as the total simulation time and the edge weight probability, which is based on road length. The sumocfg configuration file is rebuilt, centralizing the simulation in SUMO and incorporating the previously generated road infrastructure (net) and vehicle route (rou) files. The sumocfg file specifies the start and end times of the simulation, ensuring its execution according to the defined parameters. Algorithm 1 below shows instructions for configuring the mobility generator, enabling the creation of dynamic scenarios in SUMO. Algorithm 1 Optimized ILP model for RSU-based coverageRequire: Vehicle coordinates $[eqn]$ and RSU coordinates $[eqn]$ Ensure: Optimized ILP model for network coverage

1:Initialize parameters:
2: $[eqn]$ ▹ Coverage radius
3: $[eqn]$ length of x ▹ Number of vehicles
4: $[eqn]$ length of $[eqn]$ ▹ Number of RSUs
5: $[eqn]$ ▹ Capacity per RSU
6:Step 1: Compute Euclidean distance
7:for $[eqn]$ to M do
8: for $[eqn]$ to N do
9: Compute distance $[eqn]$ between vehicle j and RSU i
10: end for
11:end for
12:Step 2: Define ILP model
13:Open file to store equations
14:Step 3: Objective function
15:Minimize total number of active RSUs:
16: $[eqn]$
17:Step 4: Coverage constraints
18:Ensure at least p percentage of users are covered:
19: $[eqn]$
20:Step 5: Unique connection constraint
21:for $[eqn]$ to N do
22: $[eqn]$ ▹ Coverage flag
23: for $[eqn]$ to M do
24: if $[eqn]$ then
25: $[eqn]$
26: Print constraint equation including $[eqn]$
27: end if
28: end for
29: if $[eqn]$ then
30: Print equation forcing $[eqn]$
31: end if
32:end for
33:Step 6: Variable domains
34:for $[eqn]$ to M do
35: Print constraint: $[eqn]$ ▹ Binary variable
36:end for
37:for $[eqn]$ to N do
38: Print constraint: $[eqn]$
39:end for
40:Close file after saving equations

Once the dynamic vehicle mobility simulation has been developed, the vehicle trace from the generated files will be extracted. This trace contains essential information, such as the location and speed of each vehicle on the network at each instant in time, allowing the creation of realistic communication scenarios. The configuration file (.sumocfg) stores detailed records of vehicle movement, where parameters such as speed, frequency, accuracy, and data formats are adjusted. These settings allow the traces to be converted into formats compatible with other simulators, such as OMMET, NS2, and NS3, in the .tlc format, demonstrating the adaptability of our system. The extracted data is organized and filtered into position coordinates (x, y) to identify each vehicle and correlated times. This data is structured into columns within a temporary file temp.dat, as shown in Table 2. A vehicle counting module is incorporated for each analysis sector, utilizing the obtained traces. This module records the number of vehicles present at each instant in time $[eqn]$ , storing the data in a vector. Each analysis sector has a defined coverage radius, delimiting a specific area within the planning map. This enables the visualization of vehicle distribution in different regions and estimates the number of vehicles in each area throughout the simulation.

3.1. Vehicle Infrastructure Optimization

We analyze the communications infrastructure for the coverage model to determine the minimum number of RSUs needed to cover vehicular demand. This analysis involves evaluating the number of vehicles in the planning area and their distribution across each antenna using the Set Cover algorithm with integer linear programming, based on the Greedy algorithm. To achieve this, we consider mobility and vehicular flow as given between a vehicle and an RSU within the V2I domain, aiming to identify the minimum set of RSUs that maximizes coverage while minimizing cost. The mobility simulation is run in time intervals $[eqn]$ , allowing for dynamic and periodic coverage analysis. In addition, an ML model is implemented with a network controller that first learns the dynamics of vehicular traffic and subsequently optimizes radio resource allocation.

Deploying vehicular infrastructure in a VANET requires establishing a simulated road system to evaluate the optimal deployment of RSUs. This system considers fixed or variable coverage radio and limited capacity. The main objective is to maximize vehicle connectivity within the planning area, ensuring efficient coverage and adequate capacity for simultaneous communication between multiple nodes.

The optimization model is based on a set of N vehicles $[eqn]$ , which move dynamically along the road network during a time interval $[eqn]$ . These vehicles determine traffic flow and congestion, directly influencing the structure of the mobility network and the demand for V2I communications. From their trajectories $[eqn]$ the positions $[eqn]$ are recorded as a function of time $[eqn]$ , where i is the index that represents a specific moment in a temporal sequence.

A set of candidate sites Z of size M is also defined, representing strategic locations within the planning area where RSUs can be installed. Their activation depends on traffic demand and the capacity of the infrastructure. Each candidate site i is associated with a coverage radius $[eqn]$ and has fixed coordinates $[eqn]$ , where

[eqn]

Throughout the optimization process, the optimal distribution of RSUs is determined to fully cover the planning scenario. The final selection of locations is based on a dynamic analysis, ensuring that the infrastructure is deployed efficiently and responds to the real needs of vehicular traffic. To minimize the number of RSUs, a minimum coverage of at least a percentage $[eqn]$ is guaranteed during each time interval $[eqn]$ .

For this purpose, the set of candidate sites is defined as

[eqn]

where each element $[eqn]$ ( $[eqn]$ ) represents a candidate site with fixed coordinates $[eqn]$ . To activate an RSU, the binary variable $[eqn]$ is used; i.e., if $[eqn]$ , the RSU is active, otherwise it is inactive.

A minimum coverage percentage $[eqn]$ is required in the planning scenario, and the variable $[eqn]$ identifies the activated candidate sites. The goal is to minimize the number of active RSUs from the total number M. This is formulated in Equation (1), which defines the objective function for RSU activation.

[eqn]

The given objective function guarantees compliance with the coverage and efficiency requirements for RSU deployment. To achieve this, certain restrictions must be established. The first is defined in Equation (2), which establishes the minimum coverage condition. This restriction ensures that the V2I infrastructure always covers a minimum percentage of total vehicular demand at each instant in time $[eqn]$ .

[eqn]

where, N is the total number of vehicles in the planning area and $[eqn]$ is the minimum percentage of coverage required. Ensuring that the number of vehicles covered at any given time is no less than $[eqn]$ of the total demand indicates that the infrastructure is efficient, providing adequate coverage for system requirements.

The second constraint is given in Equation (3), where the connectivity condition between each vehicle $[eqn]$ and the RSUs within the planning area is established, ensuring that each vehicle j is associated with a single RSU i, as long as it is within the coverage radius of that RSU.

[eqn]

Equation (3) evaluates whether vehicle j is within the coverage area of RSU i, based on the Euclidean distance between them. For a connection to be considered valid, the distance must be less than the coverage radius of the RSU. In scenarios where a vehicle is located within the overlapping coverage areas of multiple RSUs, the selection of the optimal RSU is determined by the criterion of minimum distance.

The binary variable $[eqn]$ plays a central role in this process. It takes the value 1 if vehicle j is within the coverage radius of RSU i, and 0 otherwise.

Once connectivity has been validated through Equation (3), Equation (4) enforces a constraint ensuring that each vehicle connects to only one RSU. This condition is critical to avoid simultaneous associations with multiple RSUs, which could result in communication interference or resource conflicts. Therefore, Equation (4) ensures a one-to-one assignment between vehicles and RSUs, promoting system stability and communication efficiency.

[eqn]

Likewise, Equation (5) limits the capacity of RSUs relative to demand. Specifically, this constraint ensures that no RSU exceeds its available resource capacity to meet vehicular demand at any given time point. This ensures that each RSU can efficiently manage vehicle connections without overloading the infrastructure.

[eqn]

The sum $[eqn]$ counts how many RSUs have an active connection to vehicle j, and the cap parameter represents the maximum number of connections that an RSU can support as s maximum capacity. Finally, the constraint in Equation (6) defines the domain of the variables $[eqn]$ and $[eqn]$ so that they are within the logical and operational limits of the system.

[eqn]

Once the objective function and its constraints have been defined, the optimal placement of RSUs must be determined using Algorithm 1, which is used for infrastructure deployment optimization. This algorithm minimizes the number of activated units while ensuring adequate coverage.

In Algorithm 1, optimal candidate sites are identified as a function of time, considering the stochastic nature of the model. This means that both the model and the algorithm must address a dynamic sequence of events, where parameters change over time due to vehicular mobility and traffic variability, to ensure that the selection of active candidate sites optimizes network performance at each instant. RSU locations and their resources must be dynamically adjusted based on traffic demand and vehicular mobility to achieve this. At each time step $[eqn]$ , the algorithm must evaluate the current conditions, such as vehicle density, traffic flow, and congestion, to determine which candidate sites should be activated and which RSUs should provide coverage for vehicles.

3.2. Dynamic Allocation of Radio Resources in the Infrastructure

The Q-learning reinforcement learning algorithm is implemented for dynamic resource allocation, allowing the system to learn and improve through interaction with its environment. This environment consists of a planning area characterized by the dynamics of RSU antenna mobility and demand, which is determined by the number of vehicles and varies over time $[eqn]$ . To optimize the use of available resources, a controller agent is introduced that manages resource allocation based on time and space. The agent makes decisions about resource allocation and rewards, reflecting the impact of each action on system performance. As the process progresses, the system adjusts its strategy to maximize resource allocation efficiency. The agent refines its approach through multiple iterations based on the rewards it obtains, ensuring continuous improvement in resource allocation. It continuously improves resource allocation, allowing the system to adapt to environmental changes and ensuring stable and efficient communication in vehicular networks, Table 3 and Table 4. Notation and key symbols defining the state, action, and reward components of the SGNC learning-based control framework.

The set of actions defines the decisions the system can take to modify its state; these are represented by the values applied to the current state S to reach a new one. The actions are defined as $[eqn]$ , where $[eqn]$ reduces the allocated resources, 0 keeps the resources unchanged, and 1 increases them. The states are divided into $[eqn]$ , which represents the current state at time $[eqn]$ , and $[eqn]$ , which corresponds to the new state after applying an action $[eqn]$ at time $[eqn]$ . The state domain is defined as

[eqn]

where C is a positive integer corresponding to the demand components and allocated resources, which vary dynamically depending on vehicular mobility and the capacity of each antenna at time $[eqn]$ . As shown in Figure 3, the demand (number of vehicles) for the antennas constitutes the system’s state vector. The controller, responsible for resource allocation, makes decisions to distribute resources among antennas, with the options of allocating one unit ( $[eqn]$ ), taking no action (0), or removing one unit ( $[eqn]$ ). When an action is executed, a reward $[eqn]$ is obtained. Table 3 describes the variables and arrays used in the context of dynamic resource allocation through the proposed SGNC agent. This notation supports the mathematical and algorithmic formulation of the learning process.

Algorithm 2 introduces the reinforcement learning approach that trains the SGNC agent, allowing it to improve its resource allocation strategy through interaction with the environment and feedback from rewards. The algorithm begins by loading the Q matrix, which is the result of the SGNC agent’s learning process. Subsequently, an output matrix with three columns is generated in the second phase. Specifically, the first column presents the cumulative rewards associated with Action 1 (i.e., action $[eqn]$ ), which corresponds to the reduction of allocated resources. The second column displays the cumulative rewards resulting from Action 2 (i.e., action 0), in which the SGNC takes no action. The third column shows the cumulative rewards for Action 3 (i.e., action $[eqn]$ ), which represents the allocation of additional resources to the RSUs. These values provide insight into the learned policy and are used during the third phase of the algorithm to generate the corresponding policy visualization. Algorithm 2 Reinforcement learning algorithm for SGNC agent training

1:Initialize parameters:
2: $[eqn]$ , $[eqn]$ , $[eqn]$
3: $[eqn]$
4: $[eqn]$
5:Generate possible states $[eqn]$
6:for all states do
7: $[eqn]$
8:end for
9:Load MDP-based transition matrix
10:Define action space $[eqn]$
11:Initialize Q-table and visit counters
12:Set hyperparameters: $[eqn]$ 200,000, $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$
13:Compute decay factor: $[eqn]$
14:Initialize: $[eqn]$ , $[eqn]$
15:Main loop: Simulation execution
16:for $[eqn]$ to $[eqn]$ do
17: $[eqn]$
18: Generate random demand: $[eqn]$
19: Initialize assignment: $[eqn]$
20: Enforce capacity limits
21: $[eqn]$
22: $[eqn]$
23: for $[eqn]$ to $[eqn]$ do
24: for all i in RSU set do
25: Sample next state and compute state–action index
26: Epsilon-greedy action selection:
27: if $[eqn]$ then
28: Select random action
29: else
30: Select best action: $[eqn]$
31: end if
32: Update assignment: $[eqn]$
33: $[eqn]$
34: if $[eqn]$ then
35: $[eqn]$ , $[eqn]$
36: end if
37: if $[eqn]$ then
38: $[eqn]$ , $[eqn]$
39: end if
40: Update load imbalance:
41: $[eqn]$
42: Define next state: $[eqn]$
43: Update next state index and visit count
44: Compute reward $[eqn]$ according to Equation (7)
45: if $[eqn]$ then
46: $[eqn]$
47: end if
48: Update Q-learning rule:
49: $[eqn]$
50: end for
51: end for
52: Save simulation results
53: Print current time step
54:end for

The Q-Learning algorithm is a model-free algorithm that learns from the optimal policy that is built during training. This is an efficient method that does not require prior knowledge of the environment. The Bellman equation facilitates the implementation of the algorithm, even in complex problems. This is a significant advantage over other methods that can lead to suboptimal policies. The simplicity of the Q-Learning algorithm makes it a suitable candidate for use in this research. In contrast, the definition of the state space involves the creation of a vector comprising three components, designated ’states’. The first component signifies the vehicular demand of a particular antenna, the second component pertains to the allocation of resources of the antenna specified in the first component, and the third component represents the value of $[eqn]$ , which facilitates the verification of behaviour in terms of the available resources in the intelligent agent [39,50].

Based on the initial values, the reward calculation is carried out by evaluating the action applied to the current state. This process yields the next state, which is derived by considering both the penalties and rewards computed according to Equation (7).

[eqn]

The algorithm penalizes based on the results of applying these actions to the current states and thus producing new states. When the demand is greater than the allocation, the penalty is thus performed with a quadratic model of the form $[eqn]$ , where $[eqn]$ . Conversely, when the demand is less than the allocation, the penalty is performed with a linear model of the form p, where (i.e., $[eqn]$ ). That is to say, the order of magnitude of $[eqn]$ , including the value of 0.

The ensuing graphical representation offers a concise illustration of the resolution procedures, as depicted in Figure 3.

Finally, once the states, actions, penalties, rewards, and additional parameters are formally defined, the Q-value is computed. This value updates the Q-matrix by accumulating the expected rewards, to maximize the cumulative return over each episode and at time step $[eqn]$ . Through multiple iterations, the algorithm progressively converges, leading to the reinforcement of an optimal learning policy for timely and efficient resource allocation. Upon completion of the learning process, the maximum Q values corresponding to each row of the Q matrix, each associated with a specific antenna state, are identified. This allows for the selection of the optimal action in each scenario, as defined by the SGNC. Over time, these selections consolidate into a stable decision-making policy that governs the agent’s behavior, ensuring optimal resource management.

3.3. Novelty and Contributions

This paper introduces a scalable and integrated framework for V2I infrastructure planning and radio resource management in dynamic urban environments, addressing key limitations of existing vehicular network studies that typically treat mobility modeling, infrastructure deployment, and resource allocation as independent problems.

The main novel contributions of this work are summarized as follows:

Joint integration of macroscopic mobility modeling and infrastructure planning: Unlike prior works based on static or microscopic assumptions, the proposed framework derives spatiotemporal V2I demand from realistic, georeferenced urban mobility patterns, enabling infrastructure decisions that adapt directly to congestion dynamics.
A dynamic ILP-based RSU deployment model with explicit coverage guarantees: An integer linear programming formulation is proposed to dynamically activate optimal subsets of RSUs from a large candidate set, achieving more than 90% vehicular coverage while activating less than 3% of the available infrastructure, thus significantly reducing deployment costs.
A generic reinforcement-learning-based controller with explicit saturation awareness: The Smart Generic Network Controller (SGNC) introduces a compact state representation and a $[eqn]$ -based saturation mechanism that enables stable and efficient radio resource allocation under high-demand conditions, without relying on deep or multi-agent learning architectures.
Scalability and stability under high load conditions: The proposed system maintains stable allocation behavior when the network exceeds 70% of its total capacity, dynamically reallocating resources in response to demand fluctuations, a scenario rarely addressed explicitly in existing RL-based V2I studies.

Overall, this work advances the state of the art by providing a unified, scalable, and demand-aware solution that bridges realistic mobility modeling, mathematical optimization, and reinforcement learning for next-generation intelligent transportation systems.

3.4. System Model and Environment Description

The proposed system is designed to operate in a dynamic urban V2I environment, where vehicular demand, infrastructure availability, and radio resources evolve over time. To clearly define this environment, the system model is structured into three interconnected layers that interact sequentially and continuously.

First, the mobility layer represents the urban traffic environment. Realistic, georeferenced road networks are extracted from OpenStreetMap and simulated using SUMO to generate macroscopic vehicular mobility patterns. This layer provides spatiotemporal information about vehicle density, positions, and traffic flow, which directly determines the demand for V2I communications at each time interval.

Second, the infrastructure planning layer uses the mobility information to dynamically select active Road-Side Units (RSUs). A set of candidate RSU locations is predefined over the planning area, and an Integer Linear Programming (ILP) model is applied at each time interval to activate the minimum number of RSUs required to satisfy coverage and capacity constraints. This layer ensures that a target percentage of vehicles is covered while minimizing infrastructure usage.

Finally, the resource management layer is implemented through the Smart Generic Network Controller (SGNC). The SGNC observes the system state, defined by vehicular demand, allocated radio resources, and the excess resource level ( $[eqn]$ ), and interacts with the active RSUs by dynamically reallocating resources using a Q-learning algorithm. This layer enables adaptive and stable resource management under fluctuating traffic demand and high-load conditions.

Overall, the environment is modeled as a closed-loop system in which mobility dynamics drive infrastructure activation, and infrastructure conditions guide intelligent resource allocation. This layered modeling approach improves clarity, scalability, and reproducibility, while accurately reflecting the behavior of real-world urban V2I networks.

Definition of Radio Resources

In the proposed system model, the term radio resource refers to an abstract and normalized allocation unit managed by an RSU to support V2I communications. Each radio resource represents a portion of the available transmission capacity and encompasses scheduling opportunities in the time–frequency domain, such as bandwidth slices or transmission slots, without binding the model to a specific physical-layer technology.

This abstraction enables the SGNC to dynamically allocate and withdraw resources based on vehicular demand and infrastructure load, while remaining independent of underlying standards (e.g., IEEE 802.11p or LTE-V2X) [20]. As a result, the model focuses on the efficiency and stability of resource management rather than on low-level radio configurations.

4. Results Analysis

Simulated mobility scenarios were generated based on realistic maps to assess user spatiotemporal demand and estimate traffic congestion levels. These maps incorporate vehicular flow and trace management, allowing for a rigorous analysis of traffic behavior and its potential impact on congestion. Figure 4 presents the planning map that depicts the study area’s street profiles, including key details such as loaded intersection points. This information is essential for constructing the initial mobility scenario and generating vehicle traces using an algorithm.

The generated trace, comprising 3,692,035 records, corresponds to the selected scenario. Each record contains variables such as: (i) time, (ii) the spatial coordinates of the vehicles, and (iii) unique labels that identify each unit vehicular within the simulation. This trace reflects the key dimensions of the simulated scenario, including the simulation’s total duration and the number of participating vehicles. Once the road topology is defined, vehicular flow is incorporated into the initial scenario.

The optimization process begins with the selection of a set of $[eqn]$ candidate RSU sites distributed throughout the planning region to ensure coverage across the entire map. The scenario is structured within a two-dimensional Cartesian coordinate system, where both axes represent rectangular coordinates measured in meters (m). This spatial configuration allows for precise localization and interpretation of the system’s components, facilitating an accurate assessment of resource distribution and vehicular demand across the coverage area. A total of N vehicles are progressively introduced into the simulated environment as the simulation time advances. At the end of the first interval, RSU activity is evaluated based on the model and the scenario illustrated.

The simulation results are obtained by instantiating the generic ILP formulation presented in Section 3 (Equations (1)–(6)) with the specific parameters of the considered scenario. In this example, the target coverage rate is set to $[eqn]$ , and each RSU supports at most $[eqn]$ vehicles. These values are used to build the ILP instances solved at each simulation interval. Therefore, no additional optimization formulation is introduced in this section, which focuses exclusively on the evaluation results.

Once the elements are installed on the stage and the equations are generated, they are processed using the integer linear programming model on the LPSolve package, which, when executed, presents optimization results as shown below.

Table 5 shows that of the 380 candidate sites, only 11 were selected as optimal, representing 2.89% of the total. A total of 25 vehicles entered the planning area, of which 23 established connections with active RSUs, resulting in only 2 vehicles being out of coverage, corresponding to 8%. This result aligns with the 90% coverage guarantee criterion defined in the optimization model. This behavior is also visually confirmed in Figure 5.

The system gradually grows throughout the simulation, driven by the addition of more vehicles. This expansion also affects both the number and distribution of active candidate sites. The results obtained at each time interval reflect the deployment of the infrastructure, that is, the activation of candidate sites through a phase-by-phase optimization process. This dynamic provides a comprehensive view of the entire deployed infrastructure.

In contrast, in the subsequent dataset corresponding to the second time interval, a total of 62 vehicles entered the planning area, out of which 57 successfully established connections with active RSUs. Consequently, only 5 vehicles remained uncovered, representing 8.06%. Furthermore, only 13 candidate sites were selected as optimal, as illustrated in Table 6 and Figure 6.

The analysis of consecutive intervals yields different results in each case. Therefore, the example extends to the first ten intervals, as shown in Table 7 and Figure 7.

Specifically, in the final time interval presented, the results indicate that a total of 228 vehicles entered the planning area, of which 207 successfully established connections with active RSUs. Consequently, only 21 vehicles remained out of coverage, representing 9.21%, and merely 17 candidate sites were selected as optimal.

As shown in Table 7, it is possible to obtain the results corresponding to all previously analyzed intervals. These results provide valuable insights into the performance and behavior of the proposed model, with outputs consistently aligned with the predefined parameters. A key observation is that, as time progresses, the system’s complexity increases due to the continuous entry of additional elements, particularly a higher number of vehicles entering the scenario. Consequently, the set of active candidate sites tends to vary over time, reflecting the dynamic nature of the vehicular environment.

The analysis now continues with the resource allocation process, guided by the reinforcement learning algorithm proposed in the present study for the SGNC learning. To initiate the reinforcement learning algorithm, a set of numerical values is first defined: $[eqn]$ , $[eqn]$ , and $[eqn]$ , corresponding to ten RSU antennas within the planning area, each with a maximum support capacity of twelve vehicles. The value 70 represents the percentage threshold of SGNC resources at which the resource allocation control analysis begins, after computing the variable CAP as follows: The value of the variable CAP is calculated as $[eqn]$ . This value represents the resource usage threshold at which the SGNC enters an alert state to initiate a fair resource allocation process among the RSU antennas. Subsequently, the remaining capacity is calculated as: $[eqn]$ .

Table 8 illustrates that when the SGNC detects the utilization of 84 out of the 120 available resources, it initiates a detailed analysis process. With the increase of the setpoint, the SGNC progressively exhausts its available resources, prompting decisions to halt further allocations or, in critical scenarios, to revoke previously assigned resources. Despite operating under alert conditions, the system remains within its initial threshold, maintaining the capability to allocate RSUs according to current requirements.

Subsequently, the states are generated in $[eqn]$ through the construction of a two-dimensional table that contains all possible states defined as (0–12, 0–12, 0–36) = $[eqn]$ . The first two components can take values between 0 and 12, while the third ranges from 0 to 36. Therefore, in this particular case, the problem size corresponds to a state space of order $[eqn]$ , that is, a matrix of size $[eqn]$ is obtained, containing all possible states in which the system state S can be found. Consequently, the states $[eqn]$ (current state) and $[eqn]$ (next state) can take values according to the order defined by the matrix, along with their corresponding indices (indest).

Additionally, the computation of the simulation visits to each state is performed. Then, the actions take values of $[eqn]$ , 0, or 1 depending on the case, and the Q-learning matrix is initialized to zero with dimensions $[eqn]$ . Furthermore, other parameters are initialized, such as the execution time, which for the case study is set to the value $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$ . With these values, $[eqn]$ tends to zero in each iteration.

Given the initial values from the first stage, the main loop execution is performed, consisting of num iterations of the learning process, covering all episodes (100 in this case) and all $[eqn]$ RSUs. Several calculations are carried out, including the random generation of states (state), resource allocation (asig), and the value of $[eqn]$ . Based on these values, the corresponding reward values $[eqn]$ and Q-values are computed accordingly.

For greater clarity, examples are provided using the values obtained during the initial iterations of the main loop. Initially, the vector representing the current state of the RSU antennas is obtained. This corresponds to the vehicular demand at each RSU, which defines the overall system demand. In the first iteration, the following result was obtained, as shown in Table 9. Iteration 1:

The following values are then computed: the current state $[eqn]$ of antenna 1, denoted as A1; the resources assigned to it from the SGNC, denoted as $[eqn]$ ; and the value of $[eqn]$ , calculated as described in the algorithm. Based on these, the following result is obtained, as shown in Table 10:

From the interpretation of the previous table, it can be observed that antenna 1 has a demand value equal to 9, and the SGNC assigns 9 resources accordingly. The corresponding $[eqn]$ value is 26, indicating that the SGNC can still allocate additional resources, although it has entered an alert state due to resource depletion. In this example, the demand exactly matches the number of resources assigned. The action and the penalty associated with this state are 0 and 0, respectively, as shown in Table 11.

At this point, the next state of the system ( $[eqn]$ ) can be computed based on the action applied to the current state ( $[eqn]$ ), as shown in Table 12.

With these values, the reward (Rwrd) is calculated as follows: Rwrd = rw(state(i) − asig(i)) + penalty − $[eqn]$ ; thus, Rwrd = rw(9 − 9) − 0 − (26^2^/2) = −338.

Finally, using the preceding values, the matrix Q is updated at the generated indices based on the corresponding state and action indices. Q= −235.9856, This value is calculated from Algorithm 2.

The objective is to calculate the Q matrix, which contains the system states in the rows and the possible actions in the columns. This matrix includes the rewards obtained, and the maximum value in each column indicates the action with the highest reward that should be applied to the corresponding state. The Q matrix reflects the system’s policies, which are the sum of the rewards derived from its best actions applied to each state, aimed at obtaining a favorable state. Table 13 illustrates the structure of the Q matrix together with the state matrix. Combining both matrices facilitates a comprehensive analysis of the problem’s dynamics by storing the rewards for each possible action in each state. The columns of the Q matrix represent actions such as withdrawing resources, modifying allocations, or allocating additional resources.

For example, for state (1002) = (2, 1, 2), where there are 2 users, 1 assigned resource, and 2 excess resources, the Q matrix displays the values (−1.8106, −1.7310, −1.6662). The maximum value of this vector is −1.6662, corresponding to the action of +1, which indicates that the SGNC must allocate additional resources to align with existing demand.

Figure 8 illustrates the relationship between demand, resource allocation, excess resources, and actions taken, enabling a detailed and accurate interpretation of the results. The figure’s x-axis represents the difference between demand and available resources, ranging from −15 to 15. Analysis of this axis shows that if the value is negative, demand exceeds the resources available in the RSU. If the value is positive, it indicates that the available resources exceed demand. A value of 0 reflects a balance between demand and resources, which is visualized in the graph by points located on the x-axis.

In Figure 8, the Y-axis represents the excess resources used. The behavior of these values varies according to the actions taken, which are shown on the Z-axis. The actions are classified into three categories: −1 indicates the action of withdrawing resources, 0 means no changes are made to the allocation, and +1 corresponds to allocating additional resources. When analyzing the relationship between these axes, it is observed that, as demand exceeds the allocated resources, the values tend to shift towards the negative region of the X-axis. Conversely, when resources exceed demand, the values change towards the positive region of the X-axis. In situations where demand and resources are close to each other (values close to zero on the X-axis) and excess resources are low (low values on the Y-axis), the SGNC policy favors allocating additional resources, represented by the action +1. On the other hand, when excess resource levels are high, the system chooses to remove resources from RSUs using action −1, thus optimizing resource utilization.

Generally, as allocated resources exceed demand and excess resource levels increase, the system adjusts its behavior by releasing resources to the SGNC. This dynamic enables efficient resource management, balancing allocation based on demand and detected excess resources, and ensuring the optimal use of available resources. The proposed approach places special emphasis on assessing the use of excess resources, as this factor is crucial in adjusting system policies.

As shown in Figure 9, the behavior of resources is presented, along with the derivation of optimal policies, which are the system’s best actions to achieve more favorable states. Figure 9 shows the relationship between demand and resource allocation, where colors identify them. The blue color represents action −1, corresponding to the withdrawal of RSU resources, whereas the cyan color indicates action 0, which implies that no changes have been made to the resource allocation. Yellow indicates action +1, which suggests allocating additional resources to the RSU. At this point, the value of $[eqn]$ triggers an alert in the SGNC, as the system has reached a level where resources begin to deplete.

The analysis reveals that, as overutilized resources increase, the SGNC controller tends to reduce resource allocation to RSUs. However, when demand is high and the allocated resources are insufficient to cover it, the SGNC continues to allocate additional resources because the excess resource levels are not yet significant. When the value of the excess resources increases significantly, the system adjusts its strategy and withdraws resources instead of allocating them. This behavior demonstrates the controller’s ability to adapt dynamically, balancing resource allocation according to demand and detected excesses, thereby avoiding overallocation and ensuring efficient management.

Figure 10 shows the system’s behavior as excess resource usage increases. As the value of the excess resources used grows, the SGNC progressively reduces resource allocation. This behavior is evident in the decrease in the yellow color (action +1, which allocates resources) and the increase in the blue color (action −1, which indicates the withdrawal of resources). When the excess resource usage reaches 50% of its maximum capacity (equivalent to 18 units in the case study), the resource allocation approaches zero. At this point, the predominant policy of the SGNC is to withdraw resources from the RSUs, reflecting an adjustment strategy aimed at avoiding over-allocation under conditions of excess. At this point, the value of $[eqn]$ is close to 50%, which allows the SGNC to be alerted that it has reached a medium alert level.

Figure 11, corresponding to the remaining 50% of resources, shows behavior consistent with the previously described trend. Resource allocation is minimal: the yellow color, which represents positive allocation, appears only in small proportions, while the blue color, associated with resource withdrawal (action −1), predominates. This indicates that the SGNC has opted chiefly to withdraw resources from RSUs, which is logical considering that the resources available in the SGNC have already been exhausted. These results support the robustness and coherence of the proposed model, demonstrating that the solutions obtained adequately fit the conditions of the problem posed. At this point, the $[eqn]$ values are at their maximum, indicating that the SGNC has reached its highest alert level. This condition implies that resources must be withdrawn from the antennas, as they are nearly exhausted.

The analysis of visits to the states is presented in Figure 12. In this diagram, the x-axis represents the value of $[eqn]$ (excessively used resources), and the y-axis shows the total number of visits to each state. This diagram reveals that the highest frequency of visits occurs in states with $[eqn]$ values less than 50% of their maximum capacity. As $[eqn]$ increases, visits to these states decrease.

As can be seen in Figure 12, which shows the relationship between demand, the value of $[eqn]$ , and the number of state visits. It can be observed that the number of visits increases as demand grows from 0 to C. Subsequently, when $[eqn]$ reaches its maximum values between 0 and C/2, the number of visits begins to decrease at a rate of order $[eqn]$ . To facilitate the interpretation of the results, a logarithmic function is applied that adjusts the scale of the data, providing a more precise representation of the trend.

In this study, the proposed algorithm addresses the intelligent allocation of resources through an SGNC agent, which learns from the environment, specifically from the dynamics of traffic flow influenced by vehicular demand within a V2I framework. This learning occurs at the RSU level, where a vector-based metric is defined for each RSU, comprising components such as the current demand, the assigned resources, and a dynamic lambda value ( $[eqn]$ ).

The $[eqn]$ parameter reflects the saturation level of each RSU in terms of resource utilization. When $[eqn]$ approaches its upper bound, it indicates that the SGNC is no longer capable of allocating additional resources to the RSU. Conversely, when $[eqn]$ remains within the lower or medium range, the agent retains the ability to allocate more resources to the RSU, thereby promoting system balance and operational efficiency. Figure 13 illustrates the variation of the $[eqn]$ parameter over simulation time.

In summary, a dynamic and macroscopic mobility scenario was initially obtained with respect to vehicular flow, which enabled an accurate estimation of traffic demand. Based on this foundation, the optimization model described in Algorithm 2 was applied, with its mathematical formulations defined in Equations (1)–(6). This model allows the selection of active candidate sites to ensure optimal coverage. The simulation results are presented in Table 5, Table 6 and Table 7, within the Results Analysis section, where the partial values for each time interval are included.

Table 5 shows the initial scenario with $[eqn]$ candidate sites. Based on the vehicular demand, the optimization model selected 11 active RSUs, representing only $[eqn]$ of the total sites. For this first simulation, $[eqn]$ vehicles were considered, of which the model successfully covered 23, while 2 remained uncovered, corresponding to $[eqn]$ of unmet demand.

It is important to note that vehicles dynamically enter and leave the planning area, and simulations were executed in intervals of 40 time units. Table 7 summarizes the results across 10 intervals, defined as follows: $[eqn]$ for the first interval, $[eqn]$ for the second, $[eqn]$ for the third, and so on.

Subsequently, the reinforcement learning model was trained through Algorithm 2, whose nomenclature is detailed in Table 3. In this process, the agent—defined as the SGNC—was configured with its respective states, actions, and rewards, which are essential elements for its operation. The training dynamics are exemplified in the Results Analysis section preceding Table 8, where the SGNC’s resource utilization and the alerts triggered for resource allocation to active antennas are described.

In a particular case, $[eqn]$ active antennas were considered, each capable of serving up to 12 vehicles. In this scenario, the SGNC manages a total of 120 resources for distribution. However, from resource number 85 onward, alerts were triggered, which activated the intelligent allocation of the remaining 36 resources through a $[eqn]$ variable. This mechanism enabled the SGNC to learn how to manage resources efficiently as they were depleted, avoiding ineffective assignments and freeing resources within the vehicular infrastructure.

The results of this process are provided in Table 9, Table 10, Table 11, Table 12 and Table 13, while Figure 8 illustrates the relationship among demand, resource allocation, surplus, and the actions taken in policy formulation. Finally, Figure 10 and Figure 11 demonstrate the stability achieved in the overall resource allocation. The results obtained validate the effectiveness of the proposed optimization model and the reinforcement learning scheme based on the SGNC. In the first phase, the selection of active candidate sites achieved coverage above $[eqn]$ of the vehicles in the scenario, while using only $[eqn]$ of the available infrastructure. This confirms the model’s ability to significantly reduce deployment costs without considerably compromising quality of service. In the second phase, the SGNC training showed stable performance in resource allocation, intelligently managing the limitations introduced by the $[eqn]$ variable. The agent was able to equitably distribute the available resources, preventing congestion and ensuring an adequate balance between demand and network capacity.

Finally, the analysis of Table 7 and Figure 8, Figure 10 and Figure 11 reveals that the combination of mathematical optimization and reinforcement learning not only enhances initial coverage but also provides resilience and stability in resource management under dynamic vehicular scenarios. These results confirm the relevance of the proposed approach for future implementations in intelligent vehicular communication systems.

This work introduces an optimization model for dynamic vehicular mobility scenarios that efficiently selects active candidate sites based on traffic demand, achieving coverage levels above $[eqn]$ while utilizing only $[eqn]$ of the available infrastructure. A reinforcement learning agent (SGNC) is further integrated, configured with specific states, actions, and rewards, enabling adaptive and intelligent resource allocation under varying conditions. A key contribution lies in the efficient resource management mechanism through the $[eqn]$ variable, which ensures balanced distribution even in high-demand scenarios, thereby maintaining stability in the allocation process. Collectively, these contributions establish a methodological framework that combines mathematical optimization and reinforcement learning to enhance coverage, reduce deployment costs, and improve the resilience of resource management in intelligent vehicular networks.

Comparison with Existing Dynamic Resource Allocation Approaches

Most existing dynamic resource allocation strategies in vehicular networks rely on heuristic or threshold-based mechanisms that adjust resources according to instantaneous load or predefined congestion indicators. While these approaches improve upon static allocation, they often lack long-term stability and do not explicitly account for system-wide resource saturation.

Recent studies have also explored reinforcement-learning-based solutions, including deep and multi-agent variants, to dynamically allocate V2I resources. Although effective in adaptive scenarios, many of these approaches assume either abundant resources or semi-static conditions, and rarely analyze system behavior near or beyond capacity limits.

In contrast, the proposed SGNC introduces an explicit saturation-aware mechanism through the $[eqn]$ parameter, which continuously reflects the level of excess resource usage at the infrastructure level. This design enables the controller to proactively regulate allocations, preventing over-provisioning and maintaining stable operation even when resource utilization exceeds 70% of total capacity.

Compared with heuristic dynamic schemes, the SGNC demonstrates improved stability and fairness in resource redistribution. Compared with existing RL-based methods, it achieves adaptive behavior using a compact state representation and without requiring deep neural networks or multi-agent coordination, significantly reducing computational complexity while preserving performance.

This comparison highlights that the main advantage of the proposed approach lies not only in adaptability, but also in robust and scalable operation under high-load conditions, which are common in dense urban V2I environments.

5. Conclusions

This paper presents a dynamic model for vehicular network resource allocation (SGNC) that adapts to the variability of traffic flow and resource demand. The results show that the SGNC adjusts its allocation policy as excess resource usage increases. It does so by progressively reducing the resources allocated to RSUs and eventually removing them when a critical threshold is reached. This progressive reduction is a key feature of the model, ensuring efficient management, optimizing resource allocation, and preventing overallocation, thereby improving the system’s responsiveness under high-demand conditions.

Based on reinforcement learning, the model developed for simulating vehicular traffic behavior and dynamic resource allocation proved effective in adapting to changes in mobility and network topology. It incorporated optimal policies and controlled excess resources using the $[eqn]$ parameter, allowing for more informed allocation decisions and thereby improving overall system efficiency. These findings provide a solid foundation for future research and advancements in vehicular network optimization.

Despite the progress made, several areas remain for future research and improvement. The potential of implementing advanced optimization techniques, such as deep learning, is promising. Furthermore, it is suggested that the model be expanded to include additional factors, such as vehicle mobility and network conditions, to evaluate its performance in larger-scale networks and more complex scenarios. Methods for differential resource allocation based on service types could also be investigated. However, validating the model in real-world environments is crucial to ensure its effectiveness under practical conditions. These lines of work will improve the system’s efficiency and expand its applicability to more dynamic and realistic contexts.

The analysis of resource allocation metrics in vehicular networks, the primary objective is to optimize the utilization of spectrum, transmission power, and time. These parameters vary depending on the network architecture and the communication technologies employed. In this context, our study places special emphasis on vehicular networks with high traffic demand, particularly in urban roadways, where the operational dynamics significantly impact society due to the frequent incidents occurring in these areas and the consequent economic and social losses they generate.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Elassy M. Al-Hattab M. Takruri M. Badawi S. Intelligent transportation systems for sustainable smart cities Transp. Eng.20241610025210.1016/j.treng.2024.100252 · doi ↗
2Cao Q. Li J. Trucco P. Modeling Interdependencies in Intelligent Traffic Systems and Sustainable Urban Development Using Graph Neural Networks Intelligent Technology for Future Transportation Razminia A. Nguyen D.H. Springer Nature Switzerland Cham, Switzerland 2025406415
3Huo D. Davies B. Li J. Alzaghrini N. Sun X. Meng F. Abdul-Manan A.F.N. Mc Kechnie J. Posen I.D. Mac Lean H.L. How do we decarbonize one billion vehicles by 2050? Insights from a comparative life cycle assessment of electrifying light-duty vehicle fleets in the United States, China, and the United Kingdom Energy Policy 202419511439010.1016/j.enpol.2024.114390 · doi ↗
4Wang F. Tang X. Wu Y. Wang Y. Chen H. Wang G. Liao J. A Lightweight 6D Pose Estimation Network Based on Improved Atrous Spatial Pyramid Pooling Electronics 202413132110.3390/electronics 13071321 · doi ↗
5Redondo J. Yuan Z. Aslam N. Zhang J. Coverage-Aware and Reinforcement Learning Using Multi-Agent Approach for HD Map Qo S in a Realistic Environment Proceedings of the 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM)Leeds, UK 23–25 July 2024 IEEE New York, NY, USA 20241610.1109/WINCOM 62286.2024.10656951 · doi ↗
6Song C. Wu J. Xian K. Huang J. Lu L. Spatio-temporal graph learning: Traffic flow prediction of mobile edge computing in 5G/6G vehicular networks Comput. Netw.202425211067610.1016/j.comnet.2024.110676 · doi ↗
7Yang Y. Yang Z. Huang C. Xu W. Zhang Z. Niyato D. Shikh-Bahaei M. Integrated Sensing, Computing and Semantic Communication for Vehicular Networks IEEE Trans. Veh. Technol.202574183071831210.1109/TVT.2025.3575243 · doi ↗
8Saghir R. Karunathilake T. Förster A. Comparative Study of Simulators for Vehicular Networksar Xiv 20242403.00546