Evaluating Attribute Comprehension in Large Vision-Language Models

Haiwen Zhang; Zixi Yang; Yuanzhi Liu; Xinran Wang; Zheqi He; Kongming; Liang; and Zhanyu Ma

arXiv:2408.13898·cs.CV·August 27, 2024

Evaluating Attribute Comprehension in Large Vision-Language Models

Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming, Liang, and Zhanyu Ma

PDF

Open Access 1 Repo

TL;DR

This paper assesses how well large vision-language models understand object attributes, focusing on recognition and hierarchy, revealing strengths and limitations in their fine-grained visual comprehension abilities.

Contribution

It introduces a comprehensive evaluation framework for attribute comprehension in vision-language models, highlighting the impact of fine-tuning data and interaction types on their understanding.

Findings

01

Models excel at attribute recognition but have limited hierarchical understanding.

02

Image-text matching outperforms visual question answering in attribute comprehension.

03

Caption attribute information significantly influences fine-tuning effectiveness.

Abstract

Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-tuning process. In this paper, we propose to evaluate the attribute comprehension ability of large vision-language models from two perspectives: attribute recognition and attribute hierarchy understanding. We evaluate three vision-language interactions, including visual question answering, image-text matching, and image-text cosine similarity. Furthermore, we explore the factors affecting attribute comprehension during fine-tuning. Through a series of quantitative and qualitative experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhwwwww/attribute-comprehension-of-vlms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks