PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Aws Khalil; Jaerock Kwon

PMC · DOI:10.3390/s26061798·March 12, 2026

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Aws Khalil, Jaerock Kwon

PDF

Open Access

TL;DR

PLM-Net is a deep learning framework that reduces the impact of perception latency on autonomous vehicle steering, improving performance without altering the original control system.

Contribution

PLM-Net introduces a modular framework to mitigate perception latency effects in vision-based autonomous vehicle control systems.

Findings

01

PLM-Net achieved up to 62% reduction in steering error for constant latency.

02

The framework reduced Mean Absolute Error by 78% under time-varying latency conditions.

03

PLM-Net enables real-time adaptation to both constant and time-varying perception latency.

Abstract

This study introduces the Perception Latency Mitigation Network (PLM-Net), a modular deep learning framework designed to mitigate perception latency in vision-based imitation-learning lane-keeping systems. Perception latency, defined as the delay between visual sensing and steering actuation, can degrade lateral tracking performance and steering stability. While delay compensation has been extensively studied in classical predictive control systems, its treatment within vision-based imitation-learning architectures under constant and time-varying perception latency remains limited. Rather than reducing latency itself, PLM-Net mitigates its effect on control performance through a plug-in architecture that preserves the original control pipeline. The framework consists of a frozen Base Model (BM), representing an existing lane-keeping controller, and a Timed Action Prediction Model…

Figures24

Click any figure to enlarge with its caption.

Funding2

—National Science Foundation
—University of Michigan–Dearborn

Keywords

latency mitigationautonomous vehicle navigationdeep learning in robotics and automationmodel learning for controllearning from demonstration

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Vehicle Dynamics and Control Systems · Traffic control and management

Full text

1. Introduction

1.1. Motivation

Vision-based Autonomous Vehicle (AV) control follows a perception–planning–control (sense–think–act) cycle, where visual observations are processed to generate control actions. In this cycle, there is a latency between sensing the environment and applying a corresponding action, which makes the human reaction time always higher than zero [1]. Similarly, it is challenging to completely eliminate this latency in AV control [2], and reducing it through powerful GPUs and FPGAs is impractical in automotive platforms. If not properly mitigated, this latency can degrade lateral tracking performance and steering stability, potentially affecting ride comfort and control reliability. Human drivers implicitly anticipate future vehicle states when reacting to visual stimuli [3]. Inspired by this predictive behavior, we propose a deep neural network architecture that forecasts future steering actions to mitigate perception latency.

In this paper, we refer to this latency as the perception latency $[eqn]$ , as shown in Figure 1. When vehicle state is $[eqn]$ and we have observation $[eqn]$ , the corresponding action $[eqn]$ is applied at time $[eqn]$ rather than at time t ( $[eqn]$ ). By the time this action is applied, the vehicle state has changed and we have a new observation. The perception latency has two components: the algorithmic latency, which is the time required for the algorithm to infer an action from an observation, and the actuator latency, which is the time required to apply the inferred action. The actuator latency, known as steering lag in lateral control [4], can be considered constant [4]. However, as mentioned in [5], the high processing cost of visual algorithms leads to uneven time delays based on the driving scenario, which leads to an overall perception latency that is time-varying.

To address both constant and time-varying perception latency encountered during lane keeping, we propose a deep-learning-based approach, focusing primarily on vision-based AV lateral control. The contribution of this paper will be discussed after explaining the effect of the perception latency on AV lateral control.

1.2. Latency Effect on AV Lateral Control

Vision-based AV lateral control for lane keeping can be achieved using various methods. The traditional approach involves incorporating a computer vision module for lane-marking detection alongside a classical control module for path planning and control. Alternatively, a deep-learning-based approach, such as imitation learning [6], directly maps visual input to control actions, like steering angle. This paper adopts the latter method.

The effect of perception latency on AV lateral control during lane keeping is highly dependent on vehicle speed. In this study, vehicle speed is held constant during evaluation in order to isolate the effect of perception latency independently of speed-induced dynamic variations. At low speeds, the scene does not change significantly between the time the vehicle receives the observation at time t and the time it applies the action at time $[eqn]$ , because the traveled distance during the latency period is minimal ( $[eqn]$ ). In this scenario, it is reasonable to assume that $[eqn]$ , making the effect of $[eqn]$ negligible. Thus, at low speeds, we can consider that at time t, the action $[eqn]$ corresponding to observation $[eqn]$ is applied immediately ( $[eqn]$ ).

The effect of latency on AV control during lane keeping is illustrated in Figure 2. Two vehicles drive at a constant low speed: the green vehicle uses the normal real-time observation, assuming zero latency, while the red vehicle uses the delayed observation, mimicking the perception latency effect. Both vehicles start from position A and move towards position D. If we timestamp each position, then $[eqn]$ . For the green vehicle at position B, the available observation is $[eqn]$ and the corresponding action is $[eqn]$ . For the red vehicle at the same position, the available observation is $[eqn]$ and the corresponding action is $[eqn]$ . This pattern continues for the other positions. Both vehicles remain in their lanes at positions A and B, before encountering any curves, because they started within the lane and continued straight, where the steering angle is zero. Thus, at positions A and B, the actions $[eqn]$ and $[eqn]$ are effectively the same. At position C, the first curve is encountered. The green vehicle successfully turns left to stay in the lane, but the red vehicle continues straight. This deviation occurs because the red vehicle’s input observation at position C is $[eqn]$ , not $[eqn]$ , leading it to apply the action $[eqn]$ at position C. After this first curve, the red vehicle’s zigzag trajectory becomes difficult to correct, even on a straight road, due to the incorrect action taken at position C, which causes subsequent incorrect observations.

In this approach, to mitigate the perception latency and avoid this unstable driving behavior, the main objective would be to predict the correct action from the delayed observation input.

1.3. Contribution

We introduce the Perception Latency Mitigation Network (PLM-Net), outlined in Figure 3. This novel deep learning approach is intended to work easily and without requiring any changes to the original vision-based LKA system of the AV. As depicted in Figure 3, PLM-Net leverages a Timed Action Prediction Model (TAPM) alongside the Base Model (BM), where the latter represents the preexisting vision-based LKA system. The design of the TAPM is inspired by our prior work, ANEC [7], and the Branched Conditional Imitation Learning model (BCIL) proposed by Codevilla et al. [8]. It combines the concept of a predictive model capable of forecasting future action from current visual observation, akin to ANEC [7] (originally inspired by the human driver capability of dealing with the perception latency [9]), with the notion of employing multiple sub-models within the BCIL framework [8] to provide different predictive action values corresponding to different latency levels. Additionally, similar to the ‘command’ input used in BCIL to select a sub-model, the real-time perception latency $[eqn]$ is used to determine the final action value. The final action value $[eqn]$ is determined through the function $[eqn]$ where it performs linear interpolation based on the real-time latency value $[eqn]$ given all the predictive action values provided by the TAPM ( $[eqn]$ ) and the current action value provided by the BM ( $[eqn]$ ). A comprehensive explanation of our proposed method is provided in Section 3. The main contributions of this paper are summarized below.

Main contributions:

We formulate perception latency in vision-based imitation-learning lane keeping as a time-offset control problem, and analyze how delayed observations degrade steering stability and lateral tracking performance;
We propose PLM-Net, a modular plug-in latency-mitigation framework that augments an existing imitation-learning lane-keeping controller without modifying or retraining the base policy, thereby preserving its deployment characteristics;
We introduce the Timed Action Prediction Model (TAPM), a latency-conditioned multi-head predictive module that produces discrete future steering actions indexed by delay values, enabling mitigation of both constant and time-varying latency through runtime interpolation based on measured latency;
We validate the proposed framework in a deterministic closed-loop simulation environment under fixed-speed conditions to isolate latency effects, demonstrating substantial improvements in steering similarity and trajectory stability across multiple latency settings.

The structure of this paper proceeds as follows: Section 2 presents an overview of related research, while Section 3 outlines the proposed methodology. Following this, Section 4 and Section 5 present the experimental setup and the experimental findings within the simulation environment, accompanied by a comprehensive analysis. Finally, Section 6 summarizes the core findings of the study and provides valuable insights into potential avenues for future research.

2. Related Work

In the domain of vision-based control, the explicit modeling and mitigation of perception latency have received comparatively limited attention relative to other aspects of autonomous driving control. In classical control methods, discussions surrounding latency in autonomous driving have largely revolved around computational delays associated with hardware deployment [10,11,12,13] and communication delays related to network performance [14,15,16,17,18,19]. While several classical control studies have addressed perception or input delay in driving systems, these efforts are relatively sparse compared to the broader literature on delay-aware predictive control. Xu et al. [4] modeled steering lag as a fixed 200 ms delay, while Liu et al. [5] proposed a hierarchical MPC framework to compensate for time-variant input delays on the order of several hundred milliseconds. More recently, Kalaria et al. [20,21] developed robust control strategies to handle fixed perception delays (e.g., 200 ms), demonstrating safety improvements for both lane-keeping and racing tasks. In the context of networked and teleoperated autonomous vehicles, Kamtam et al. [22] reviewed delay patterns between 100 and 350 ms across vision and communication channels. Recent system-level studies have also begun to address the impact of latency variability on autonomous driving pipelines. Han and Kim [23] propose probabilistic scheduling techniques to minimize end-to-end perception–planning–control latency in real-time AV systems. While valuable, these approaches primarily aim to reduce latency, while our work focuses on learning to compensate for its effect on control behavior. Unlike classical or system-level approaches, PLM-Net introduces a learning-based method that adaptively mitigates both fixed and time-varying latency without relying on handcrafted dynamics or delay-tuned controllers. Classical delay-aware control methods, including preview control and delay-compensated MPC, explicitly incorporate system dynamics and delay models into the control law and typically require accurate vehicle modeling, as well as controller redesign. In contrast, PLM-Net does not modify the original control structure nor require explicit vehicle dynamics modeling. Instead, it augments an already trained vision-based imitation-learning policy with a latency-conditioned predictive layer that operates as a plug-in module. Therefore, rather than replacing model-based delay compensation strategies, the proposed framework provides a complementary learning-based solution tailored to data-driven lateral control systems where analytical models or controller redesign may not be feasible.

Within vision-based neural network control models, research efforts have primarily focused on model architecture and performance optimization, with comparatively limited attention given to explicit modeling of the perception latency issue [6,24,25,26,27,28,29,30,31,32,33,34].

Only few studies have discussed perception latency. Li et al. [2] underscored the significance of addressing latency in online vision-based perception systems. To tackle this issue, they introduced a methodology for assessing the real-time performance of perception systems, effectively balancing accuracy and latency. However, it is important to note that the paper primarily focuses on proposing a metric and benchmark for evaluating the real-time performance of perception systems, rather than offering a direct solution for vehicle control. While their approach provides valuable insights into quantifying the trade-off between accuracy and latency in perception systems, it does not directly address the challenges associated with mitigating latency in autonomous driving scenarios. Khalil et al. [7] addressed the perception latency issue by proposing the Adaptive Neural Ensemble Controller (ANEC). However, ANEC assumed perception latency to be constant and did not address time-varying latency. Weighted sum was used to combine the output from the two driving models, but the weight function parameters had to be carefully chosen and adjusted by hand. This introduces environment-specific parameter tuning, which may limit straightforward deployment across different scenarios. Furthermore, ANEC does not explicitly model latency as a runtime-conditioned variable nor provide discrete latency-indexed predictive heads. Instead, it blends multiple policies through adaptive weighting. In contrast, PLM-Net formulates latency mitigation as an explicit mapping between measurable delay values and predicted future steering actions. This design separates the base driving policy from the latency-compensation layer, enabling deterministic interpolation across latency conditions while preserving the original controller parameters. Mao et al. [35] explored latency in the context of video object detection by analyzing the detection latency of various video object detectors. While they introduced a metric to quantify latency, their study focused on measurement rather than proposing solutions to mitigate latency. Kocic et al. [36] tried to decrease the latency in driving by altering the neural network architecture. Although this approach could potentially diminish latency, preserving the original accuracy poses a significant challenge. Wu et al. [37] emphasized that control-based driving models, which convert images into control signals, inherently exhibit perception latency and are susceptible to failure due to their focus on the current time step. In response, they developed Trajectory-guided Control Prediction (TCP), a multi-task learning system integrating a control prediction model with a trajectory planning model. However, this approach necessitates the extraction of precise trajectories, presenting a notable challenge. Popov et al. [38] introduced a latent space generative world model that, while not explicitly addressing latency, exhibited inherent robustness to it during deployment. This suggests that certain architectures may inadvertently compensate for latency, though without targeted mechanisms. In contrast, our method proactively models latency during both training and inference, allowing it to adaptively adjust actions based on real-time delay conditions. Tampuu et al. [39] examined how delays and vehicle speed affect the test-time performance of end-to-end driving models, highlighting that even modest delay values can significantly impact behavior unless label alignment is addressed. However, their method focuses on mitigating symptoms of latency degradation, not latency modeling itself.

In our approach, we acknowledge the inevitability of latency and explicitly model its effect during both training and inference. By integrating latency-indexed predicted future actions with the current action based on the real-time latency value, the proposed method mitigates the impact of latency on steering behavior in vision-based imitation-learning lane keeping.

3. Method

This section describes the proposed Perception Latency Mitigation Network (PLM-Net) and its evaluation methodology. We first introduce the conceptual framework and explain how latency mitigation is achieved through the interaction between the Base Model (BM) and the Timed Action Prediction Model (TAPM). We then detail the architectural design of both models, followed by the training procedure and the performance metrics used to evaluate latency mitigation in vision-based lane keeping.

3.1. PLM-Net Framework

We begin by describing the overall framework of PLM-Net and the interaction between its components. As shown in Figure 3, the PLM-Net has two major components, the Base Model (BM) and the Timed Action Prediction Model (TAPM). This novel deep learning approach smoothly integrates the TAPM with the original vision-based LKA system of the AV, represented by the BM. Figure 4 shows how these two models mitigate the perception latency, where $[eqn]$ is the BM policy and $[eqn]$ is the TAPM policy. As explained in (1), given the vehicle state $[eqn]$ at time t, $[eqn]$ takes the input $[eqn]$ , where $[eqn]$ is the visual observation and $[eqn]$ is the vehicle speed at time t, and provides the action $[eqn]$ .

[eqn]

The TAPM is a predictive action model, meaning that the policy $[eqn]$ generates a set of predictive action values $[eqn]$ , where each action corresponds to a future operating state associated with a specific latency value in $[eqn]$ .

The number of predictive actions in the vector $[eqn]$ depends on the number of sub-models in TAPM. If there are N sub-models, then $[eqn]$ . For example, if $[eqn]$ s, then the action $[eqn]$ represents the action that the BM would take if the vehicle was in the state $[eqn]$ . The process of obtaining the vector $[eqn]$ is described by (2), where the inputs to $[eqn]$ are the output of the $[eqn]$ (i.e., the action $[eqn]$ ) along with two feature vectors, the image feature vector $[eqn]$ and the vehicle velocity vector $[eqn]$ , both derived from $[eqn]$ .

[eqn]

The final action of the PLM-Net, denoted by $[eqn]$ , is obtained through linear interpolation, as detailed in Algorithm 1. This interpolation combines the outputs of both $[eqn]$ and $[eqn]$ according to the real-time perception latency value $[eqn]$ . In this work, the latency $[eqn]$ is assumed to be measurable or monitored at inference time (e.g., through system-level latency tracking mechanisms) and is explicitly injected in the controlled simulation environment used for evaluation. Algorithm 1 Linear Interpolation for LatencyRequire: $[eqn]$ // Target latency value and list of known latency. Require: $[eqn]$ // Corresponding action values for known latency. Ensure: $[eqn]$ // Interpolated action value for target latency 1: for $[eqn]$ to N do 2: if $[eqn]$ then 3: $[eqn]$ 4: end if 5: end for 6: return $[eqn]$

Since $[eqn]$ represents the steering action corresponding to the zero-latency case; it is equivalent to $[eqn]$ . Therefore, the value $[eqn]$ is prepended to $[eqn]$ and the action $[eqn]$ is appended to $[eqn]$ to construct the reference latency vector $[eqn]$ and its associated steering vector $[eqn]$ , as shown in (3).

[eqn]

Given a measured latency value $[eqn]$ , the algorithm identifies the two adjacent latency entries in $[eqn]$ that bound $[eqn]$ and performs linear interpolation between their corresponding steering predictions. If $[eqn]$ exactly matches one of the predefined latency values, the corresponding steering action is selected directly without interpolation.

At inference time, the Base Model first computes the nominal steering action, the TAPM generates the latency-indexed predictive actions, and the measured latency $[eqn]$ determines the interpolated final output. This procedure enables smooth mitigation of both constant and time-varying perception latency within the predefined latency range.

3.2. PLM-Net Models Architecture

We now detail the internal architecture of the Base Model (BM) and the Timed Action Prediction Model (TAPM).

The architecture of the PLM-Net models is illustrated in Figure 5. The network design of the BM is presented in Figure 5a, and the network design of the TAPM is presented in Figure 5b.

The BM network design is inspired by the NVIDIA PilotNet structure [40], with modifications tailored to our requirements. Specifically, we adapt the network to accept visual observations ( $[eqn]$ ) and vehicle speed ( $[eqn]$ ) as inputs and to predict steering angle (action $[eqn]$ ) as output, and we add dropout layers to improve generalization and avoid overfitting. The BM features two primary inputs: the visual observation $[eqn]$ and the vehicle speed $[eqn]$ . The visual observation undergoes processing through five convolutional layers, then a flatten layer to obtain the image feature vector $[eqn]$ . Simultaneously, the vehicle speed input is directed through a fully connected layer that has 144 neurons, resulting in the formation of the vehicle speed vector $[eqn]$ . Subsequently, the image feature vector $[eqn]$ and the vehicle speed vector $[eqn]$ are concatenated and forwarded to a multi-layer perceptron (MLP) network. This MLP configuration consists of four fully connected layers interspersed with three dropout layers. The fully connected layers contain 512, 100, 50, and 10 neurons, respectively. The dropout layers maintain a dropout rate of $[eqn]$ . The final output of the BM $[eqn]$ , the image feature vector $[eqn]$ , and the vehicle speed vector $[eqn]$ , are forwarded to the TAPM network.

The TAPM network design is the result of fusing two key ideas. Firstly, it incorporates a predictive model, similar to ANEC [7], which was inspired by human drivers’ ability to mitigate perception latency. Secondly, it utilizes the BCIL framework [8], employing multiple sub-models to provide a range of predictive action values that align with different latency levels, and adding a ‘command’ input, representing the perception latency $[eqn]$ , to influence the final action value. The TAPM network inputs are $[eqn]$ , $[eqn]$ , and $[eqn]$ , forwarded from the BM. These three inputs go to 100-neuron, 500-neuron, and 100-neuron fully connected layers, respectively. The outputs of these three layers are then concatenated to be forwarded to all sub-models. Each sub-model consists of three fully connected layers, with 200, 100, and 50 neurons, interspersed with two dropout layers with a dropout rate of $[eqn]$ . Sub-models outputs will result in $[eqn]$ , shown in (2). Although a single neural network could, in principle, learn a continuous mapping between latency values and steering actions, prior work [8,31] has shown that branched architectures improve stability and performance when handling condition-dependent outputs. This motivated our adoption of a BCIL-inspired design for TAPM, where separate sub-models specialize in distinct latency conditions.

3.3. PLM-Net Models Training

This subsection explains the supervised training procedures for both the BM and the TAPM. The functionality of a vision-based LKA system can be achieved through imitation learning, where we directly map the steering angle (i.e., action $[eqn]$ ) to the input $[eqn]$ , where $[eqn]$ is the visual observation and $[eqn]$ is the vehicle speed at time t. The training dataset collected by an expert driver can be defined as $[eqn]$ , where M is the total number of time-steps. Figure 6 explains the training process of the PLM-Net models. Learning the BM policy $[eqn]$ is a supervised learning problem. The parameters $[eqn]$ of the policy are optimized by minimizing the prediction error of $[eqn]$ given the input $[eqn]$ , as shown in (4), where we use the Mean Squared Error (MSE) to calculate the loss per sample. Once optimized, the BM can predict the action $[eqn]$ given the input $[eqn]$ at time t, as shown in (1).

[eqn]

To learn the TAPM policy $[eqn]$ , we generate a new dataset $[eqn]$ from the original dataset $[eqn]$ , mapping the input $[eqn]$ to N distinct future actions $[eqn]$ corresponding to N distinct latency values $[eqn]$ (2), where

[eqn]

The optimization of the parameters $[eqn]$ of the TAPM policy $[eqn]$ is explained in (5). We minimize the prediction error of $[eqn]$ given the input $[eqn]$ , where $[eqn]$ indicates the ground truth values of $[eqn]$ . We use MSE to calculate the loss per sample.

[eqn]

To integrate the TAPM with the existing LKA system without modifying it, we set the BM to be non-trainable during the training of the TAPM. This modular separation allows PLM-Net to serve as a plug-in latency mitigation layer, enabling deployment without retraining the base controller. This design ensures that latency mitigation is achieved without altering the original control policy, preserving the integrity and deployment characteristics of the base lane-keeping controller.

3.4. Performance Metrics

Finally, we describe the quantitative metrics used to evaluate latency mitigation performance. In the context of lateral control for AVs, perception latency can lead to delayed or incorrect actions, particularly affecting the steering angle, which in turn impacts the overall trajectory of the vehicle, as explained in Section 1.2. Therefore, to assess PLM-Net’s ability to mitigate this latency, two primary evaluation criteria can be employed: steering angle similarity and trajectory similarity. Accurate steering is essential for maintaining lane position and following the desired path, ensuring precise vehicle control even under latency conditions. Meanwhile, trajectory similarity evaluates how closely the vehicle’s path adheres to the intended trajectory, providing insight into the broader impact of perception latency on vehicle navigation and the efficacy of PLM-Net in mitigating it.

4. Simulation-Based Validation Setup

This section outlines the experimental setup for validating the Perception Latency Mitigation Network (PLM-Net). We begin by describing the simulator used for creating a controlled testing environment. Next, we detail the dataset for training and evaluation. We then discuss the methods for measuring driving performance, followed by the parameter tuning process. After that, an ablation study is presented to highlight key development decisions. Finally, we specify the computational environment and machine learning framework, including hardware and software configurations.

4.1. Simulator

To simulate and evaluate the effect of perception latency on lateral control, we used the OSCAR simulator [41], which provides a simulated Ford Fusion vehicle and multiple test tracks. OSCAR is tightly integrated with an ROS (Robotic Operating System) [42] and a Gazebo multi-robot 3D simulator [43], enabling real-time control testing and closed-loop behavior evaluation. These capabilities made it well-suited for our goal of studying perception latency under controlled and repeatable conditions.

While other simulators such as CARLA [44] or NVIDIA Drive SIM [45] offer photorealistic rendering and advanced sensor simulation, the focus of our work does not require such features. Our primary goal is to explore the learning and integration of latency-aware control models, and OSCAR provided a practical and efficient platform for this investigation. In our experiments, perception latency was explicitly injected within the control loop to emulate delay, enabling precise and repeatable evaluation under both constant and time-varying latency profiles.

The simulated environment consisted of a single ego vehicle without surrounding traffic, and the human driver followed the predefined lane centerline as the reference trajectory during data collection.

4.2. Dataset

4.2.1. Training and Test Tracks

The training track used for PLM-Net is identical to the one employed in our prior work [7], ensuring consistency in base controller learning. To evaluate generalization and latency mitigation performance, a separate three-lane test track was designed containing a combination of straight segments and both left and right turns, as shown in Figure 7. This track was not used during training and was selected to expose the controller to multiple curvature conditions so that the interaction between perception latency and different road geometries could be examined during evaluation.

4.2.2. Data Collection

Our training dataset, denoted by $[eqn]$ , was collected by a human driver navigating the training track using an OSCAR simulator. This simulator captures visual observations via a mounted camera on the vehicle, alongside critical control values such as steering angle, throttle position, braking pressure, time, velocity, acceleration, and position. High-quality data was collected using the Logitech G920 dual-motor feedback driving force racing wheel with pedals and a gear shifter, resulting in approximately 115,000 clean training samples. Table 1 provides detailed statistics on the steering and velocity data.

The steering angle for the vehicle is represented on a scale from $[eqn]$ to 1, where 0 denotes the center position. According to the right-hand rule, positive steering angles (0 to 1) correspond to rotations to the left, and negative steering angles (0 to $[eqn]$ ) correspond to rotations to the right. The steering wheel has a maximum rotation angle of $[eqn]$ . This mapping implies that a steering angle of 1 corresponds to a $[eqn]$ rotation to the left, while a steering angle of $[eqn]$ corresponds to a $[eqn]$ rotation to the right. Thus, the steering angle range is scaled such that 0 to 1 maps to $[eqn]$ to $[eqn]$ and 0 to $[eqn]$ maps to $[eqn]$ to $[eqn]$ .

During both training and evaluation, the vehicle operated at a constant predefined speed to isolate the effect of perception latency on lateral control. The relationship between speed and latency impact was previously analyzed in [7]; in this work, speed was held constant to focus specifically on latency mitigation behavior.

4.2.3. Data Balancing

As shown in Table 1, approximately $[eqn]$ of the steering values in our dataset are close to $[eqn]$ , indicating the vehicle traveling on a straight road segment. Training a driving model solely on this dataset would introduce bias. To address this, we conducted histogram-based data balancing to reduce the skew towards zero steering values. Figure 8 illustrates the steering angle histogram before (left) and after (right) the balancing process. Post-balancing, our dataset comprised approximately 67,000 data samples.

4.2.4. Data Augmentation

To improve the model’s generalization capabilities and augment dataset diversity during training, we implemented data augmentation techniques. Each image presented to the network undergoes a random subset of transformations, including horizontal flipping, where the steering value is negated, and random changes in brightness, while preserving the original steering angle. These augmentation techniques have proven effective in enhancing the robustness of our model during training.

4.3. Driving Performance Evaluation

While Section 3.4 explained the rationale behind choosing the performance metrics and their importance, this section discusses the adopted methods to measure them in our experiments.

4.3.1. Steering Angle Similarity

We analyzed the steering angle values under different conditions: BM driving without latency, BM driving with latency but without TAPM, and BM driving with latency and TAPM (utilizing PLM-Net). This analysis was conducted both qualitatively, through visual inspection, and quantitatively, by calculating metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). These metrics quantify pointwise deviation between steering signals, where lower values indicate improved temporal alignment with the latency-free baseline controller.

4.3.2. Trajectory Similarity

The performance metrics used to compare the driving trajectories were adopted from [46,47]. We measure the similarity between driving trajectories based on lane center positioning. We use partial curve mapping, Frechet distance, area between curves, curve length, and dynamic time warping from [46], and the Driving Trajectory Stability Index (DTSI) from [47]. These trajectory metrics capture complementary aspects of spatial deviation, including geometric similarity, accumulated lateral error, temporal alignment of motion profiles, and overall path stability relative to the latency-free baseline.

All evaluations were conducted in a deterministic closed-loop simulation environment with fixed speed and predefined latency injection profiles. Under identical latency settings, repeated trials yield identical steering outputs and vehicle trajectories. Therefore, a single representative evaluation per latency condition is sufficient to characterize system behavior.

4.4. Parameter Tuning

For the TAPM, we used $[eqn]$ sub-models, meaning the TAPM predicts five future actions for five different latency values in $[eqn]$ . Specifically, the latency values start with $[eqn]$ s, with increments of $[eqn]$ seconds, resulting in $[eqn]$ seconds. Consequently, the predictive action vector $[eqn]$ in (2) becomes

[eqn]

The selected latency range was designed to cover reported steering actuation delays (0.2 s) [4], as well as visual and network-induced delays, in connected and teleoperated autonomous vehicle systems, which frequently range from 100 to 350 ms and exhibit noticeable degradation around 300 ms [5,22].

As detailed in (3), the reference latency vector $[eqn]$ and the corresponding action vector $[eqn]$ are formed by including the BM action for zero latency. Thus, $[eqn]$ and $[eqn]$ become

[eqn]

[eqn]

These configurations enable the PLM-Net to handle both constant and time-varying perception latency within the range [0–0.35] s. Latency values outside this predefined range were not evaluated because, beyond approximately $[eqn]$ – $[eqn]$ s, the baseline model (BM) departed the lane and subsequently left the track, preventing meaningful trajectory-based evaluation. Extending the latency range would require additional latency heads and potentially a redesigned baseline controller.

Both models, the BM and the TAPM, were trained using the Adam optimizer [48] with a batch size of 32 and a learning rate of $[eqn]$ . Table 2, shows the number of trainable and non-trainable parameters for the BM and the TAPM. The BM has 6,006,191 parameters, in which all of them are trainable. The TAPM has 12,256,396 total parameters, where 6,250,205 are trainable and 6,006,191 are non-trainable since the BM layers are set to be not trainable when training the TAPM. For the performance metric DTSI, we used the default parameters recommended in [47].

4.5. Latency Knowledge and Modeling Assumptions

In this study, perception latency is injected within the closed-loop simulation environment, allowing direct access to the delay value at runtime. The framework therefore assumes availability of a measurable scalar latency estimate rather than knowledge of its physical origin. In practical autonomous driving systems, such estimates can be obtained through timestamp synchronization across sensing, processing, communication, and actuation modules.

The injected delay represents aggregate perception-to-actuation latency rather than separating algorithmic, communication, and actuator components. This abstraction allows the mitigation mechanism to remain agnostic to the physical source of delay and focus on compensating its behavioral impact on steering control.

Linear interpolation between discrete latency-conditioned predictions is adopted because steering evolution remains locally smooth under moderate delay variation within the trained range. The discrete latency heads are ordered with respect to delay magnitude, enabling interpolation to approximate intermediate latency values without modifying the base controller.

4.6. Ablation Study

In our exploration of different model architectures for the TAPM network, we experimented with various configurations to optimize performance. Initially, we introduced an additional fully connected layer with 500 neurons after the concatenation layer preceding the five sub-models. However, this adjustment overly complicated the learning process for the predictive action values. Since this layer was shared among all sub-models, it hindered their individual learning capacities, resulting in subpar outcomes. Further experimentation involved modifying the number of layers within the sub-models. Reducing the layers to two, with 100 and 50 neurons respectively, led to the model’s inability to effectively learn predictive actions. Similarly, adding an extra fully connected layer to each sub-model with 300 neurons yielded comparable (if not slightly inferior) results to the existing architecture, thus introducing unnecessary complexity without significant improvement. Additionally, we explored the integration of Long Short-Term Memory (LSTM) [49] layers into the sub-models to capture temporal information. However, this approach encountered substantial challenges. Firstly, the model’s complexity increased significantly, impeding computational efficiency. Secondly, the nature of capturing temporal information hindered conventional data balancing techniques and random data sampling from the dataset to enhance batch diversity.

During model training, we found that a batch size of 32 yielded optimal results compared to smaller sizes such as 16 and 8. Furthermore, we experimented with adjusting the learning rate from $[eqn]$ to $[eqn]$ . However, the model failed to converge under the lower learning rate, suggesting that the original rate was more conducive to effective training.

4.7. Computational Environment and Machine Learning Framework

Our machine learning framework is built upon TensorFlow and Keras libraries. Specifically, we utilized Keras version $[eqn]$ in conjunction with TensorFlow-GPU version $[eqn]$ , leveraging CUDA 9 and cuDNN $[eqn]$ for GPU acceleration. All computational experiments were conducted on a hardware setup featuring an Intel i7-10700K CPU, 32 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti with 11 GB of GPU memory. The operating system employed for these experiments was Ubuntu $[eqn]$ .

4.8. Computational Cost Analysis

To assess the computational overhead introduced by PLM-Net at deployment, we report inference latency and GPU memory usage relative to the BM. Parameter counts and the trainable/non-trainable breakdown are provided in Table 2. During inference, PLM-Net executes a forward pass through the BM, followed by a forward pass through the TAPM and a lightweight linear interpolation step (Algorithm 1).

Inference-time measurements were performed with batch size 1 after discarding the first 50 samples to mitigate warm-up effects. The BM achieved an average inference time of 2.352 ms per frame (p95: 4.532 ms; N = 670) and occupied approximately 650 MB of GPU memory. In comparison, PLM-Net required 5.909 ms per frame on average (p95: 9.880 ms; N = 686) and occupied approximately 778 MB of GPU memory. This corresponds to an additional computational overhead of 3.557 ms per frame and an additional 128 MB of GPU memory (19.7% increase) relative to the BM.

Given a 30 Hz control frequency (33.3 ms per cycle), the observed inference times confirm that PLM-Net maintains real-time feasibility with a substantial timing margin. The additional computational overhead is justified by the improved trajectory tracking performance under latency conditions, as demonstrated in Section 5.

5. Results

Our experimental design aimed to investigate the impact of perception latency on driving and assess the efficacy of our proposed solution, PLM-Net, in mitigating this effect (as detailed in Section 1.2). To simulate latency, we introduced delays in the input data and applied closed-loop velocity control to maintain a constant vehicle speed (approximately 60 km/h), reflecting realistic latency conditions observed in perception–control pipelines where perception and decision-making processes are delayed. For instance, with a $[eqn]$ s latency, the available visual observation at time t becomes $[eqn]$ instead of $[eqn]$ , causing the baseline model (BM) to compute actions based on outdated perception. PLM-Net aims to mitigate this mismatch by predicting an action that better approximates the desired action at time t, despite the delayed observation.

We assess the impact of perception latency through a comparison of driving behaviors: BM without latency, BM with latency, and PLM-Net with latency. Successful mitigation by PLM-Net is indicated when its driving performance closely resembles that of the latency-free BM. Our evaluation involves analyzing steering angle similarity and driving trajectory similarity, as detailed in Section 4.3, for both constant and time-variant perception latency.

To examine how perception latency interacts with road geometry, trajectory similarity metrics are reported not only for the full test track but also for individual track segments, including straight sections, left turns, and right turns. This segment-level evaluation enables analysis of latency-induced deviations under different curvature conditions and provides additional insight into how the mitigation mechanism behaves across distinct driving scenarios.

5.1. Constant Perception Latency Mitigation

In evaluating PLM-Net under constant perception latency, we focus on a latency of $[eqn]$ s, with similar trends observed for other constant latency values, as illustrated in Appendix A. Figure 9 provides a qualitative comparison of steering angles over time between the BM with and without latency and PLM-Net with the same latency. In this figure, the blue line represents the BM driving without latency, the green line represents the BM driving with $[eqn]$ s latency, and the red line represents PLM-Net driving with $[eqn]$ s latency. Additionally, Figure 10 provides a visual representation of vehicle trajectories on the test track, colored based on steering angle, to further elucidate the comparative performance. Table 3 quantifies steering angle errors, demonstrating PLM-Net’s superior performance in reducing errors compared to the BM under identical latency conditions. Under a constant perception latency of 0.2 s, the performance of the BM degraded substantially.

Furthermore, Figure 11 presents trajectory comparisons qualitatively, while Table 4 presents trajectory comparisons quantitatively, showing PLM-Net’s ability to maintain accurate driving trajectories despite latency-induced challenges. Each color-coded trajectory corresponds to a different driving condition: blue for the BM driving without latency, green for the BM driving with $[eqn]$ s latency, and red for PLM-Net driving with $[eqn]$ s latency. Examining the deviation from the lane center on the full track, the Partial Curve Mapping metric for the BM with $[eqn]$ s latency increased by $[eqn]$ , while PLM-Net maintained a much smaller increase of only $[eqn]$ . Similarly, improvements in Frechet distance, area between curves, curve length, and DTSI demonstrate that PLM-Net significantly reduces the trajectory deviation caused by latency. The additional segments of Figure 11 and Table 4, specifically parts (b), (c), and (d), which depict the trajectories on a straight road, right turn, and left turn, respectively, also demonstrate that PLM-Net effectively mitigates latency, similar to the results observed on the full track.

The Mean Absolute Error (MAE) between the BM without latency and the BM with latency is $[eqn]$ , indicating a substantial degradation in steering angle accuracy. However, when using PLM-Net, the performance decline was mitigated, with the MAE reduced to $[eqn]$ . This corresponds to a $[eqn]$ reduction in MAE relative to the BM under the same latency condition. Additionally, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) were similarly improved with PLM-Net, showing reductions of $[eqn]$ and $[eqn]$ , respectively.

5.2. Time-Variant Perception Latency Mitigation

For time-variant perception latency, we evaluate PLM-Net against varying latency levels ([0.0–0.35] s). Similar to the constant latency scenario, the upper image in Figure 12 illustrates qualitative steering angle comparisons over time, with the blue line representing the BM driving without latency, the green line representing the BM driving with time-variant latency, and the red line representing PLM-Net driving with time-variant latency. The lower image illustrates the varying latency values experienced by both models over time. Additionally, Figure 13 provides a visual representation of vehicle trajectories on the test track, colored based on steering angle values. Table 5 quantifies steering angle errors, demonstrating PLM-Net’s effectiveness in reducing errors compared to the BM under time-variant latency conditions. Under a time-variant perception latency of [0.0–0.35] s, the performance of the BM degraded substantially. The Mean Absolute Error (MAE) between the BM without latency and the BM with latency is $[eqn]$ , indicating a substantial degradation in steering angle similarity. However, when using PLM-Net, the performance decline was mitigated, with the MAE reduced to $[eqn]$ . This represents a $[eqn]$ improvement in MAE compared to the BM under the same latency condition. Additionally, the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) were similarly improved with PLM-Net, showing reductions of $[eqn]$ and $[eqn]$ , respectively.

Figure 14 presents trajectory comparisons qualitatively, while Table 6 presents trajectory comparisons quantitatively, confirming PLM-Net’s successful mitigation of time-variant perception latency. Each color-coded trajectory corresponds to a different driving condition: blue for the BM driving without latency, green for the BM driving with time-variant latency, and red for PLM-Net driving with time-variant latency. The Partial Curve Mapping metric revealed a $[eqn]$ increase in deviation from the lane center for the BM, with a [0.0–0.35] s time-variant latency on the full track, while PLM-Net showed a more modest increase of $[eqn]$ . Similarly, the Frechet distance for the BM increased by $[eqn]$ , compared to a $[eqn]$ increase for PLM-Net. This pattern of reduced deviation is also reflected in the improvements in area between curves, curve length, and DTSI, indicating that PLM-Net significantly mitigates the trajectory deviations caused by latency. Parts (b), (c), and (d) of Figure 14 and Table 6, which illustrate the trajectories during a straight segment, a right turn, and a left turn, respectively, exhibit results consistent with the full track, confirming that PLM-Net successfully mitigates latency.

Overall, the experimental findings underscore PLM-Net’s robustness in mitigating both constant and time-variant perception latency, indicating its potential for improving robustness against perception latency in vision-based autonomous driving systems.

6. Conclusions

This paper introduced PLM-Net, a learning-based framework designed to mitigate the effect of perception latency in vision-based imitation-learning lateral control systems. By integrating a Timed Action Prediction Model (TAPM) with an existing Base Model (BM), PLM-Net anticipates latency-induced mismatch between perception and control without modifying the original controller architecture.

The TAPM predicts a discrete set of future steering actions corresponding to predefined latency values, and linear interpolation is used to generate the final control output according to the real-time latency. This design enables mitigation of both constant and time-varying perception latency within the modeled range.

Experimental results in a deterministic closed-loop simulation environment demonstrated that PLM-Net substantially reduces latency-induced steering and trajectory errors. Under a constant $[eqn]$ s latency, PLM-Net achieved a $[eqn]$ reduction in MAE compared to the baseline model. Under time-variant latency within [0.0–0.35] s, the MAE reduction reached $[eqn]$ . Trajectory-based metrics further confirmed improved lane-following performance across straight and turning segments.

Limitations: This study was conducted in a deterministic closed-loop simulation environment with a fixed vehicle speed to isolate the effect of perception latency on lateral control behavior. While this controlled setting enables systematic evaluation of latency mitigation behavior, broader validation across multiple drivers, stochastic latency realizations, and more diverse road environments remains an important direction for future work. The training dataset was collected from a single driver in a controlled simulator setting, consistent with common imitation-learning frameworks, and was evaluated on a separate test track to assess generalization under the studied conditions. The modeled latency range was bounded within [0.0–0.35] s, beyond which the baseline controller departed the track boundaries, preventing meaningful trajectory-based comparison. Additionally, the proposed framework was evaluated for lateral control only under aggregate perception-to-actuation delay modeling. These constraints define the scope of validation for the present study and motivate future investigation under broader operational, dynamic, and multi-sensor conditions. The present work focuses on architectural latency mitigation under bounded delay assumptions and does not claim replacement or superiority over model-based delay compensation methods; rather, it provides a complementary learning-based solution designed for vision-based imitation-learning control pipelines where explicit vehicle modeling or controller redesign may not be feasible.

Future work will focus on extending the modeled latency range, evaluating the framework under varying vehicle speeds and more complex traffic scenarios, integrating the approach into higher-fidelity simulation and real-world platforms, and exploring hybrid strategies that combine learning-based latency mitigation with classical delay-aware control methods to enhance robustness and theoretical guarantees.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Kwon J. Choe Y. Enhanced facilitatory neuronal dynamics for delay compensation Proceedings of the 2007 International Joint Conference on Neural Networks IEEE Piscataway, NJ, USA 200720402045
2Li M. Wang Y.X. Ramanan D. Towards streaming perception Proceedings of the European Conference on Computer Vision Springer Cham, Switzerland 2020473488
3Ho M.K. Griffiths T.L. Cognitive science as a source of forward and inverse models of human decisions for robotics and control Annu. Rev. Control. Robot. Auton. Syst.20225335310.1146/annurev-control-042920-015547 · doi ↗
4Xu S. Peng H. Tang Y. Preview path tracking control with delay compensation for autonomous vehicles IEEE Trans. Intell. Transp. Syst.2020222979298910.1109/TITS.2020.2978417 · doi ↗
5Liu Q. Liu Y. Liu C. Chen B. Zhang W. Li L. Ji X. Hierarchical lateral control scheme for autonomous vehicle with uneven time delays induced by vision sensors Sensors 201818254410.3390/s 1808254430081510 PMC 6111847 · doi ↗ · pubmed ↗
6Chen Z. Huang X. End-to-end learning for lane keeping of self-driving cars Proceedings of the IEEE Intelligent Vehicles Symposium (IV)IEEE Piscataway, NJ, USA 20171856186010.1109/IVS.2017.7995975 · doi ↗
7Khalil A. Kwon J. ANEC: Adaptive Neural Ensemble Controller for Mitigating Latency Problems in Vision-Based Autonomous Driving Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)IEEE Piscataway, NJ, USA 202393409346
8Codevilla F. Müller M. López A. Koltun V. Dosovitskiy A. End-to-End Driving Via Conditional Imitation Learning Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA)IEEE Piscataway, NJ, USA 20184693470010.1109/ICRA.2018.8460487 · doi ↗