VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models
Duoxun Tang, Dasen Dai, Jiyao Wang, Xiao Yang, Jianyu Wang, Siqi Cai

TL;DR
VidDoS introduces a universal energy-latency attack on Video-LLMs that significantly degrades performance and increases inference latency, posing safety risks in critical applications.
Contribution
This paper presents the first universal ELA framework for Video-LLMs using instance-agnostic triggers and novel optimization techniques, bypassing real-time constraints.
Findings
Induces over 205× token expansion
Increases inference latency by more than 15×
Causes safety violations in autonomous driving simulations
Abstract
Video-LLMs are increasingly deployed in safety-critical applications but are vulnerable to Energy-Latency Attacks (ELAs) that exhaust computational resources. Current image-centric methods fail because temporal aggregation mechanisms dilute individual frame perturbations. Additionally, real-time demands make instance-wise optimization impractical for continuous video streams. We introduce VidDoS, which is the first universal ELA framework tailored for Video-LLMs. Our method leverages universal optimization to create instance-agnostic triggers that require no inference-time gradient calculation. We achieve this through to steer models toward expensive target sequences, combined with a and to override conciseness priors. Testing across three mainstream Video-LLMs and three video datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Malware Detection Techniques
