Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong

TL;DR
This paper reveals that current Video Large Language Models often fail to detect harmful content in videos due to design flaws, posing significant safety risks and highlighting the need for improved sampling and decoding strategies.
Contribution
The study identifies key design flaws in VideoLLMs causing harmful content omission and proposes targeted attacks and evaluation methods to expose these vulnerabilities.
Findings
Harmful content omission rate exceeds 90% in most cases
Current models fail to detect visible harmful content in videos
Design flaws significantly contribute to safety gaps in VideoLLMs
Abstract
Video Large Language Models (VideoLLMs) are increasingly deployed on numerous critical applications, where users rely on auto-generated summaries while casually skimming the video stream. We show that this interaction hides a critical safety gap: if harmful content is embedded in a video, either as full-frame inserts or as small corner patches, state-of-the-art VideoLLMs rarely mention the harmful content in the output, despite its clear visibility to human viewers. A root-cause analysis reveals three compounding design flaws: (1) insufficient temporal coverage resulting from the sparse, uniformly spaced frame sampling used by most leading VideoLLMs, (2) spatial information loss introduced by aggressive token downsampling within sampled frames, and (3) encoder-decoder disconnection, whereby visual cues are only weakly utilized during text generation. Leveraging these insights, we craft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Generative Adversarial Networks and Image Synthesis
