UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
Yi Li, Haonan Wang, Qixiang Zhang, Boyu Xiao, Chenchang Hu, Hualiang Wang, Xiaomeng Li

TL;DR
UniEval introduces a comprehensive, unified evaluation framework for multimodal models, enabling simplified, diverse, and human-aligned assessment without extra data or models, advancing the evaluation of multimodal understanding and generation.
Contribution
It presents the first unified evaluation framework, UniEval, including UniBench and UniScore, for assessing multimodal models holistically without additional models or annotations.
Findings
UniBench is more challenging than existing benchmarks.
UniScore closely aligns with human evaluations.
Extensive evaluation reveals new insights into state-of-the-art models.
Abstract
The emergence of unified multimodal understanding and generation models is rapidly attracting attention because of their ability to enhance instruction-following capabilities while minimizing model redundancy. However, there is a lack of a unified evaluation framework for these models, which would enable an elegant, simplified, and overall evaluation. Current models conduct evaluations on multiple task-specific benchmarks, but there are significant limitations, such as the lack of overall results, errors from extra evaluation models, reliance on extensive labeled images, benchmarks that lack diversity, and metrics with limited capacity for instruction-following evaluation. To tackle these challenges, we introduce UniEval, the first evaluation framework designed for unified multimodal models without extra models, images, or annotations. This facilitates a simplified and unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need
