The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality
Umberto Domanti, Moritz Mock, Sergio Agnoli, Antonella De Angeli

TL;DR
This study examines how Large Language Models assess idea originality in creative tasks, revealing a bias towards artificial responses that diminishes when controlling for idea elaboration.
Contribution
It provides empirical evidence of self-preference bias in LLMs and highlights the importance of idea elaboration in aligning machine and human originality assessments.
Findings
LLMs show a bias towards artificial responses in originality assessment.
Controlling for idea elaboration reduces the bias in LLM assessments.
Human raters and LLMs differ in their evaluation of responses.
Abstract
Automatic systems are increasingly used to assess the originality of responses in creative tasks. They offer a potential solution to key limitations of human assessment (cost, fatigue, and subjectivity), but there is preliminary evidence of a self-preference bias. Accordingly, automatic systems tend to prefer outcomes that are more closely related to their style, rather than to the human one. In this paper, we investigated how Large Language Models (LLMs) align with human raters in assessing the originality of responses in a divergent thinking task. We analysed 4,813 responses to the Alternate Uses Task produced by higher and lower creative humans and ChatGPT-4o. Human raters were two university students who underwent intensive training. Machine raters were two specialised systems fine-tuned on AUT responses and corresponding human ratings (OCSAI and CLAUS) and ChatGPT-4o, which was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
