AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Yuhang Wu; Wenmeng Yu; Yean Cheng; Yan Wang; Xiaohan Zhang; Jiazheng Xu; Ming Ding; Yuxiao Dong

arXiv:2406.09295·cs.CL·June 5, 2025

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

PDF

Open Access 1 Datasets

TL;DR

AlignMMBench is a comprehensive benchmark designed to evaluate Chinese multimodal alignment in large vision-language models, focusing on nuanced, real-world scenarios and multi-turn dialogues to better assess model robustness and capabilities.

Contribution

This paper introduces AlignMMBench, the first benchmark specifically for Chinese visual contexts, with a new evaluation pipeline and a quantitative alignment score for robustness assessment.

Findings

01

VLMs show varied performance across tasks

02

Benchmark reveals limitations in current models' robustness

03

AlignMMBench provides detailed insights into Chinese multimodal alignment

Abstract

Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, which provides more nuanced evaluations of alignment capabilities and is the first benchmark specifically designed for Chinese visual contexts. This benchmark is meticulously curated from real-world scenarios and internet sources, encompassing thirteen specific tasks across three categories, and includes both single-turn and multi-turn dialogue scenarios. Incorporating a prompt rewrite strategy, AlignMMBench encompasses 1,054 images and 4,978 question-answer pairs. To facilitate the evaluation pipeline, we develop CritiqueVLM, a rule-calibrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zai-org/AlignMMBench
dataset· 166 dl
166 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsFocus