Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video   MLLMs

Zijia Zhao; Haoyu Lu; Yuqi Huo; Yifan Du; Tongtian Yue; Longteng Guo,; Bingning Wang; Weipeng Chen; Jing Liu

arXiv:2406.09367·cs.CV·March 10, 2025

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo,, Bingning Wang, Weipeng Chen, Jing Liu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces VideoNIAH, a synthetic video benchmark framework that efficiently evaluates video understanding skills in multimodal large language models by decoupling content and queries.

Contribution

The paper presents a scalable, automated method for constructing video benchmarks using synthetic data, enabling targeted skill evaluation and diverse video content.

Findings

01

Significant differences in model capabilities across tasks

02

Insights into model strengths and weaknesses in video understanding

03

Recommendations for improving video MLLM training

Abstract

Video understanding is a crucial next step for multimodal large language models (MLLMs). Various benchmarks are introduced for better evaluating the MLLMs. Nevertheless, current video benchmarks are still inefficient for evaluating video models during iterative development due to the high cost of constructing datasets and the difficulty in isolating specific skills. In this paper, we propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation. VideoNIAH decouples video content from their query-responses by inserting unrelated visual 'needles' into original videos. The framework automates the generation of query-response pairs using predefined rules, minimizing manual labor. The queries focus on specific aspects of video understanding, enabling more skill-specific evaluations. The separation between video content and the queries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joez17/videoniah
pytorchOfficial

Models

🤗
mmiemon/BIMBA-LLaVA-Qwen2-7B
model· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques

MethodsFocus