Video-LevelGauge: Investigating Contextual Positional Bias in Large Video Language Models

Hou Xia; Zheren Fu; Fangcan Ling; Jiajun Li; Yi Tu; Zhendong Mao; Yongdong Zhang

arXiv:2508.19650·cs.CV·September 1, 2025

Video-LevelGauge: Investigating Contextual Positional Bias in Large Video Language Models

Hou Xia, Zheren Fu, Fangcan Ling, Jiajun Li, Yi Tu, Zhendong Mao, Yongdong Zhang

PDF

1 Datasets

TL;DR

Video-LevelGauge is a benchmark designed to systematically evaluate and analyze positional bias in large video language models, revealing significant biases in open-source models and robustness in commercial ones.

Contribution

The paper introduces Video-LevelGauge, a novel benchmark with standardized probes and analysis methods to assess positional bias in LVLMs, filling a gap in nuanced performance evaluation.

Findings

01

Many open-source LVLMs exhibit head or neighbor-content biases.

02

Commercial models like Gemini2.5-Pro demonstrate consistent performance across videos.

03

Analysis of context length and model scale offers insights for bias mitigation.

Abstract

Large video language models (LVLMs) have made notable progress in video understanding, spurring the development of corresponding evaluation benchmarks. However, existing benchmarks generally assess overall performance across entire video sequences, overlooking nuanced behaviors such as contextual positional bias, a critical yet under-explored aspect of LVLM performance. We present Video-LevelGauge, a dedicated benchmark designed to systematically assess positional bias in LVLMs. We employ standardized probes and customized contextual setups, allowing flexible control over context length, probe position, and contextual types to simulate diverse real-world scenarios. In addition, we introduce a comprehensive analysis method that combines statistical measures with morphological pattern recognition to characterize bias. Our benchmark comprises 438 manually curated videos spanning multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Cola-any/Video-LevelGauge
dataset· 265 dl
265 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.