A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models
Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka, Zitnik, Liang Lin

TL;DR
This paper introduces a causality-aware evaluation framework called LoTbench for assessing the creativity of multimodal large language models, emphasizing the importance of understanding creative thought processes and aligning evaluations with human cognition.
Contribution
It proposes LoTbench, an interactive, causality-aware evaluation framework, and demonstrates its effectiveness in measuring LLM creativity beyond traditional metrics.
Findings
LoTbench better quantifies LLM creativity and visualizes creative thought processes.
Most LLMs show constrained creativity, but the gap with humans is not large.
LoTbench correlates strongly with multimodal cognition benchmarks, unlike traditional metrics.
Abstract
Recently, numerous benchmarks have been developed to evaluate the logical reasoning abilities of large language models (LLMs). However, assessing the equally important creative capabilities of LLMs is challenging due to the subjective, diverse, and data-scarce nature of creativity, especially in multimodal scenarios. In this paper, we consider the comprehensive pipeline for evaluating the creativity of multimodal LLMs, with a focus on suitable evaluation platforms and methodologies. First, we find the Oogiri game, a creativity-driven task requiring humor, associative thinking, and the ability to produce unexpected responses to text, images, or both. This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity. Next, beyond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsFocus
