Failures to Surface Harmful Contents in Video Large Language Models

Yuxin Cao; Wei Song; Derui Wang; Jingling Xue; Jin Song Dong

arXiv:2508.10974·cs.MM·November 18, 2025

Failures to Surface Harmful Contents in Video Large Language Models

Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong

PDF

Open Access 1 Video

TL;DR

This paper reveals that current Video Large Language Models often fail to detect harmful content in videos due to design flaws, posing significant safety risks and highlighting the need for improved sampling and decoding strategies.

Contribution

The study identifies key design flaws in VideoLLMs causing harmful content omission and proposes targeted attacks and evaluation methods to expose these vulnerabilities.

Findings

01

Harmful content omission rate exceeds 90% in most cases

02

Current models fail to detect visible harmful content in videos

03

Design flaws significantly contribute to safety gaps in VideoLLMs

Abstract

Video Large Language Models (VideoLLMs) are increasingly deployed on numerous critical applications, where users rely on auto-generated summaries while casually skimming the video stream. We show that this interaction hides a critical safety gap: if harmful content is embedded in a video, either as full-frame inserts or as small corner patches, state-of-the-art VideoLLMs rarely mention the harmful content in the output, despite its clear visibility to human viewers. A root-cause analysis reveals three compounding design flaws: (1) insufficient temporal coverage resulting from the sparse, uniformly spaced frame sampling used by most leading VideoLLMs, (2) spatial information loss introduced by aggressive token downsampling within sampled frames, and (3) encoder-decoder disconnection, whereby visual cues are only weakly utilized during text generation. Leveraging these insights, we craft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Failures to Surface Harmful Contents in Video Large Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Generative Adversarial Networks and Image Synthesis