Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

Li-Chun Lu; Miri Liu; Pin-Chun Lu; Yufei Tian; Shao-Hua Sun; Nanyun Peng

arXiv:2508.05470·cs.CL·January 29, 2026

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

Li-Chun Lu, Miri Liu, Pin-Chun Lu, Yufei Tian, Shao-Hua Sun, Nanyun Peng

PDF

1 Video

TL;DR

This paper critically analyzes four existing creativity evaluation metrics across diverse domains, revealing their limitations and inconsistencies, and emphasizes the need for more reliable, human-aligned assessment methods.

Contribution

It provides a comprehensive comparison of current creativity metrics across multiple domains, highlighting their shortcomings and proposing the necessity for improved evaluation frameworks.

Findings

01

Metrics often fail to generalize across domains.

02

Different metrics frequently produce conflicting judgments.

03

Perplexity reflects fluency, not creativity.

Abstract

We examine, analyze, and compare four representative creativity measures--perplexity, LLM-as-a-Judge, the Creativity Index (CI; measuring n-gram overlap with web corpora), and syntactic templates (detecting repetition of common part-of-speech patterns)--across the diverse creative domains, such as creative writing, unconventional problem-solving, and research ideation. For each domain, we compile datasets with human-aligned creative and uncreative examples and evaluate each metric's ability to discriminate between the two sets. Our analyses reveal limited consistency both across domains and metrics, as metrics that distinguish creativity in one domain fail in others (e.g., CI correctly distinguishes in creative writing but fails in problem-solving), and different metrics often disagree on the same data points (e.g., CI suggests one set to be more creative, while perplexity indicates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations· underline