Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics

Georgii Gotin; Ekaterina Shumitskaya; Anastasia Antsiferova; Dmitriy Vatolin

arXiv:2501.08415·cs.CV·March 25, 2026

Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics

Georgii Gotin, Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin

PDF

Open Access 1 Repo

TL;DR

This paper introduces IC2VQA, a cross-modal attack method that exploits similarities in low-level features between images and videos to effectively generate adversarial examples that fool black-box VQA models, revealing vulnerabilities in current metrics.

Contribution

The paper proposes a novel cross-modal attack approach, IC2VQA, which leverages a CLIP module to improve transferability of adversarial perturbations across different VQA models.

Findings

01

IC2VQA achieves high success rates in attacking black-box VQA models.

02

The method outperforms existing black-box attack strategies in efficiency and effectiveness.

03

Adding a CLIP module enhances transferability of adversarial examples.

Abstract

Recent studies have revealed that modern image and video quality assessment (IQA/VQA) metrics are vulnerable to adversarial attacks. An attacker can manipulate a video through preprocessing to artificially increase its quality score according to a certain metric, despite no actual improvement in visual quality. Most of the attacks studied in the literature are white-box attacks, while black-box attacks in the context of VQA have received less attention. Moreover, some research indicates a lack of transferability of adversarial examples generated for one model to another when applied to VQA. In this paper, we propose a cross-modal attack method, IC2VQA, aimed at exploring the vulnerabilities of modern VQA models. This approach is motivated by the observation that the low-level feature spaces of images and videos are similar. We investigate the transferability of adversarial perturbations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

georgegotin/ic2vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Digital Media Forensic Detection · Advanced Steganography and Watermarking Techniques

MethodsContrastive Language-Image Pre-training