Assessment of Multimodal Large Language Models in Alignment with Human   Values

Zhelun Shi; Zhipin Wang; Hongxing Fan; Zaibin Zhang; Lijun Li,; Yongting Zhang; Zhenfei Yin; Lu Sheng; Yu Qiao; Jing Shao

arXiv:2403.17830·cs.CV·March 27, 2024·2 cites

Assessment of Multimodal Large Language Models in Alignment with Human Values

Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li,, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces Ch3Ef, a comprehensive dataset and evaluation strategy to assess how well Multimodal Large Language Models align with human values across diverse tasks and domains.

Contribution

The paper presents the Ch3Ef dataset and a unified evaluation strategy specifically designed for assessing MLLMs' alignment with human values in visual and multimodal contexts.

Findings

01

MLLMs show varied alignment with human values across domains.

02

Evaluation reveals specific strengths and limitations of current MLLMs.

03

Guidelines for future improvements in MLLM alignment are proposed.

Abstract

Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining hhh dimensions in the visual world and the difficulty in collecting relevant data that accurately mirrors real-world situations. To address this gap, we introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for assessing alignment with human expectations. Ch3Ef dataset contains 1002 human-annotated data samples, covering 12 domains and 46 tasks based on the hhh principle. We also present a unified evaluation strategy supporting assessment across various scenarios and different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openlamm/lamm
pytorch

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems