Temporal Misalignment Attacks against Multimodal Perception in Autonomous Driving
Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Ning Zhang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou

TL;DR
This paper introduces DejaVu, a novel attack exploiting temporal misalignments in multimodal perception systems of autonomous vehicles, significantly degrading perception accuracy and potentially causing safety hazards.
Contribution
The paper presents DejaVu, the first attack exploiting in-vehicular network vulnerabilities to induce temporal misalignments in multimodal perception for autonomous driving.
Findings
Object detection accuracy drops up to 88.5% with LiDAR delay.
Object tracking accuracy decreases by 73% with camera delay.
Feasibility demonstrated through hardware-in-the-loop and simulation tests.
Abstract
Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, an attack that exploits the in-vehicular network to manipulate the integrity of time and create subtle temporal misalignments, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals the sensors' task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs, while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
