VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved   LVLMs

Yiming Yang; Yangyang Guo; Hui Lu; Yan Wang

arXiv:2502.16602·cs.CV·February 25, 2025

VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs

Yiming Yang, Yangyang Guo, Hui Lu, Yan Wang

PDF

Open Access

TL;DR

This paper introduces a benchmark and a mitigation method for addressing language bias in video-involved large vision-language models, revealing current models' limitations and proposing a solution that improves their fairness without retraining.

Contribution

It presents a new Video Language Bias Evaluation Benchmark and a Multi-branch Contrastive Decoding method to mitigate language bias in LVLMs without retraining.

Findings

01

Existing LVLMs are significantly biased towards language.

02

The proposed MCD method effectively reduces language bias.

03

MCD maintains model performance across various tasks.

Abstract

Recently, Large Vision-Language Models (LVLMs) have made significant strides across diverse multimodal tasks and benchmarks. This paper reveals a largely under-explored problem from existing video-involved LVLMs - language bias, where models tend to prioritize language over video and thus result in incorrect responses. To address this research gap, we first collect a Video Language Bias Evaluation Benchmark, which is specifically designed to assess the language bias in video-involved LVLMs through two key tasks: ambiguous video contrast and interrogative question probing. Accordingly, we design accompanied evaluation metrics that aim to penalize LVLMs being biased by language. In addition, we also propose Multi-branch Contrastive Decoding (MCD), introducing two expert branches to simultaneously counteract language bias potentially generated by the amateur text-only branch. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Interpreting and Communication in Healthcare