VLind-Bench: Measuring Language Priors in Large Vision-Language Models

Kang-il Lee; Minbeom Kim; Seunghyun Yoon; Minsung Kim; Dongryeol Lee,; Hyukhun Koh; Kyomin Jung

arXiv:2406.08702·cs.AI·February 11, 2025

VLind-Bench: Measuring Language Priors in Large Vision-Language Models

Kang-il Lee, Minbeom Kim, Seunghyun Yoon, Minsung Kim, Dongryeol Lee,, Hyukhun Koh, Kyomin Jung

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces VLind-Bench, a new benchmark designed to accurately measure language priors in large vision-language models, revealing widespread reliance on textual patterns over image content.

Contribution

The paper presents VLind-Bench, the first benchmark specifically targeting language priors in LVLMs, with comprehensive tests to disentangle priors from other factors.

Findings

01

Most LVLMs heavily rely on language priors.

02

Existing benchmarks inadequately measure language priors.

03

VLind-Bench effectively isolates language priors from other influences.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with images that are out of training distribution. Despite its importance, current methods for accurately measuring language priors in LVLMs are poorly studied. Although existing benchmarks based on counterfactual or out-of-distribution images can partially be used to measure language priors, they fail to disentangle language priors from other confounding factors. To this end, we propose a new benchmark called VLind-Bench, which is the first benchmark specifically designed to measure the language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

klee972/vlind-bench
pytorchOfficial

Datasets

klee972/VLind-Bench
dataset· 438 dl
438 dl

Videos

VLind-Bench: Measuring Language Priors in Large Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling