From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge
Hui Lu, Yi Yu, Song Xia, Yiming Yang, Deepu Rajan, Boon Poh Ng, Alex Kot, Xudong Jiang

TL;DR
This paper uncovers a new security vulnerability in Video Foundation Models by proposing TVA, an attack method that exploits temporal dynamics without task or data access, demonstrating significant risks across multiple video tasks.
Contribution
The paper introduces TVA, a novel temporal-aware adversarial attack that does not require task-specific data or surrogate models, highlighting a practical security threat to open-source VFMs.
Findings
TVA effectively attacks downstream models and MLLMs across 24 tasks.
Adversarial vulnerabilities are demonstrated without access to training data or model architecture.
The study reveals a significant security risk in deploying open-source VFMs.
Abstract
Large-scale Video Foundation Models (VFMs) has significantly advanced various video-related tasks, either through task-specific models or Multi-modal Large Language Models (MLLMs). However, the open accessibility of VFMs also introduces critical security risks, as adversaries can exploit full knowledge of the VFMs to launch potent attacks. This paper investigates a novel and practical adversarial threat scenario: attacking downstream models or MLLMs fine-tuned from open-source VFMs, without requiring access to the victim task, training data, model query, and architecture. In contrast to conventional transfer-based attacks that rely on task-aligned surrogate models, we demonstrate that adversarial vulnerabilities can be exploited directly from the VFMs. To this end, we propose the Transferable Video Attack (TVA), a temporal-aware adversarial attack method that leverages the temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
