TL;DR
MuseScorer is an automated system that assesses idea originality by retrieving similar ideas and using LLMs to determine if an idea is novel, matching human judgment and enabling scalable creativity evaluation.
Contribution
It introduces a fully automated, retrieval-based originality scoring system using LLMs, reducing manual effort and scaling creativity assessment.
Findings
Matches human clustering structure (AMI=0.59)
High participant-level scoring correlation (r=0.89)
Validates across multiple datasets
Abstract
An objective, face-valid method for scoring idea originality is to measure each idea's statistical infrequency within a population -- an approach long used in creativity research. Yet, computing these frequencies requires manually bucketing idea rephrasings, a process that is subjective, labor-intensive, error-prone, and brittle at scale. We introduce MuseScorer, a fully automated, psychometrically validated system for frequency-based originality scoring. MuseScorer integrates a Large Language Model (LLM) with externally orchestrated retrieval: given a new idea, it retrieves semantically similar prior idea-buckets and zero-shot prompts the LLM to judge whether the idea fits an existing bucket or forms a new one. These buckets enable frequency-based originality scoring without human annotation. Across five datasets N_{participants}=1143, n_{ideas}=16,294), MuseScorer matches human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
