Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation

Sahid Hossain Mustakim; S M Jishanul Islam; Ummay Maria Muna; Montasir Chowdhury; Mohammed Jawwadul Islam; Sadia Ahmmed; Tashfia Sikder; Syed Tasdid Azam Dhrubo; Swakkhar Shatabda

arXiv:2507.11968·cs.CV·July 17, 2025

Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation

Sahid Hossain Mustakim, S M Jishanul Islam, Ummay Maria Muna, Montasir Chowdhury, Mohammed Jawwadul Islam, Sadia Ahmmed, Tashfia Sikder, Syed Tasdid Azam Dhrubo, Swakkhar Shatabda

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new framework and dataset for evaluating the robustness of multimodal language models against tri-modal adversarial attacks on short videos, revealing significant vulnerabilities and failure modes.

Contribution

It presents the SVMA dataset and ChimeraBreak attack strategy, addressing the gap in multimodal safety evaluation for short-form videos and exposing model vulnerabilities.

Findings

01

High attack success rates on state-of-the-art models

02

Identification of model biases and failure modes

03

Effective use of LLMs as judges for attack reasoning

Abstract

Multimodal Large Language Models (MLLMs) are increasingly used for content moderation, yet their robustness in short-form video contexts remains underexplored. Current safety evaluations often rely on unimodal attacks, failing to address combined attack vulnerabilities. In this paper, we introduce a comprehensive framework for evaluating the tri-modal safety of MLLMs. First, we present the Short-Video Multimodal Adversarial (SVMA) dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks. Second, we propose ChimeraBreak, a novel tri-modal attack strategy that simultaneously challenges visual, auditory, and semantic reasoning pathways. Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR). Our findings uncover distinct failure modes, showing model biases toward misclassifying benign or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

smji/SVMA-dataset
dataset· 19 dl
19 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Adversarial Robustness in Machine Learning