A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large   Language Models

Zhongzhan Huang; Shanshan Zhong; Pan Zhou; Shanghua Gao; Marinka; Zitnik; Liang Lin

arXiv:2501.15147·cs.AI·February 25, 2025

A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models

Zhongzhan Huang, Shanshan Zhong, Pan Zhou, Shanghua Gao, Marinka, Zitnik, Liang Lin

PDF

Open Access

TL;DR

This paper introduces a causality-aware evaluation framework called LoTbench for assessing the creativity of multimodal large language models, emphasizing the importance of understanding creative thought processes and aligning evaluations with human cognition.

Contribution

It proposes LoTbench, an interactive, causality-aware evaluation framework, and demonstrates its effectiveness in measuring LLM creativity beyond traditional metrics.

Findings

01

LoTbench better quantifies LLM creativity and visualizes creative thought processes.

02

Most LLMs show constrained creativity, but the gap with humans is not large.

03

LoTbench correlates strongly with multimodal cognition benchmarks, unlike traditional metrics.

Abstract

Recently, numerous benchmarks have been developed to evaluate the logical reasoning abilities of large language models (LLMs). However, assessing the equally important creative capabilities of LLMs is challenging due to the subjective, diverse, and data-scarce nature of creativity, especially in multimodal scenarios. In this paper, we consider the comprehensive pipeline for evaluating the creativity of multimodal LLMs, with a focus on suitable evaluation platforms and methodologies. First, we find the Oogiri game, a creativity-driven task requiring humor, associative thinking, and the ability to produce unexpected responses to text, images, or both. This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity. Next, beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsFocus