Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li,, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

TL;DR
This paper introduces Ch3Ef, a comprehensive dataset and evaluation strategy to assess how well Multimodal Large Language Models align with human values across diverse tasks and domains.
Contribution
The paper presents the Ch3Ef dataset and a unified evaluation strategy specifically designed for assessing MLLMs' alignment with human values in visual and multimodal contexts.
Findings
MLLMs show varied alignment with human values across domains.
Evaluation reveals specific strengths and limitations of current MLLMs.
Guidelines for future improvements in MLLM alignment are proposed.
Abstract
Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining hhh dimensions in the visual world and the difficulty in collecting relevant data that accurately mirrors real-world situations. To address this gap, we introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for assessing alignment with human expectations. Ch3Ef dataset contains 1002 human-annotated data samples, covering 12 domains and 46 tasks based on the hhh principle. We also present a unified evaluation strategy supporting assessment across various scenarios and different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
