MuseScorer: Idea Originality Scoring At Scale

Ali Sarosh Bangash; Krish Veera; Ishfat Abrar Islam; Raiyan Abdul Baten

arXiv:2505.16232·cs.CL·September 22, 2025

MuseScorer: Idea Originality Scoring At Scale

Ali Sarosh Bangash, Krish Veera, Ishfat Abrar Islam, Raiyan Abdul Baten

PDF

1 Repo 1 Video

TL;DR

MuseScorer is an automated system that assesses idea originality by retrieving similar ideas and using LLMs to determine if an idea is novel, matching human judgment and enabling scalable creativity evaluation.

Contribution

It introduces a fully automated, retrieval-based originality scoring system using LLMs, reducing manual effort and scaling creativity assessment.

Findings

01

Matches human clustering structure (AMI=0.59)

02

High participant-level scoring correlation (r=0.89)

03

Validates across multiple datasets

Abstract

An objective, face-valid method for scoring idea originality is to measure each idea's statistical infrequency within a population -- an approach long used in creativity research. Yet, computing these frequencies requires manually bucketing idea rephrasings, a process that is subjective, labor-intensive, error-prone, and brittle at scale. We introduce MuseScorer, a fully automated, psychometrically validated system for frequency-based originality scoring. MuseScorer integrates a Large Language Model (LLM) with externally orchestrated retrieval: given a new idea, it retrieves semantically similar prior idea-buckets and zero-shot prompts the LLM to judge whether the idea fits an existing bucket or forms a new one. These buckets enable frequency-based originality scoring without human annotation. Across five datasets N_{participants}=1143, n_{ideas}=16,294), MuseScorer matches human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cssai-research/muserag
pytorchOfficial

Videos

MuseScorer: Idea Originality Scoring At Scale· underline