Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
YiFan Zhang, Shanglin Lei, Runqi Qiao, Zhuoma GongQue, Xiaoshuai Song,, Guanting Dong, Qiuna Tan, Zhe Wei, Peiqing Yang, Ye Tian, Yadong Xue, Xiaofei, Wang, Honggang Zhang

TL;DR
The paper introduces the Multi-Dimensional Insights (MDI) benchmark to evaluate large multimodal models across diverse real-world scenarios, emphasizing understanding, reasoning, and personalization for different age groups.
Contribution
It presents a comprehensive, multi-faceted benchmark with over 500 images, stratified questions, and age-specific assessments to better evaluate LMMs' real-world alignment and personalization capabilities.
Findings
GPT-4o achieves 79% accuracy on age-related tasks.
Existing LMMs still have significant room for improvement.
The benchmark reveals gaps in models' ability to meet diverse human needs.
Abstract
The rapidly developing field of large multimodal models (LMMs) has led to the emergence of diverse models with remarkable capabilities. However, existing benchmarks fail to comprehensively, objectively and accurately evaluate whether LMMs align with the diverse needs of humans in real-world scenarios. To bridge this gap, we propose the Multi-Dimensional Insights (MDI) benchmark, which includes over 500 images covering six common scenarios of human life. Notably, the MDI-Benchmark offers two significant advantages over existing evaluations: (1) Each image is accompanied by two types of questions: simple questions to assess the model's understanding of the image, and complex questions to evaluate the model's ability to analyze and reason beyond basic content. (2) Recognizing that people of different age groups have varying needs and perspectives when faced with the same scenario, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimedia Communication and Technology
MethodsALIGN
