Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context

Kai Ruan; Xuan Wang; Jixiang Hong; Peng Wang; Yang Liu; Hao Sun

arXiv:2412.17596·cs.CL·February 24, 2026

Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context

Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu, Hao Sun

PDF

Open Access 1 Repo 2 Models 2 Datasets

TL;DR

This paper introduces LiveIdeaBench, a benchmark for evaluating LLMs' scientific idea generation from minimal prompts, revealing that current models' creativity is not well predicted by general intelligence metrics.

Contribution

The paper presents a novel benchmark for assessing LLMs' divergent thinking in scientific idea generation using single-keyword prompts, highlighting gaps in current evaluation methods.

Findings

01

Models like QwQ-32B-preview perform comparably to top models in idea generation.

02

Standard intelligence metrics poorly predict creative capabilities in scientific idea generation.

03

Enhancing idea generation may require different training strategies than those for general problem-solving.

Abstract

While Large Language Models (LLMs) demonstrate remarkable capabilities in scientific tasks such as literature analysis and experimental design (e.g., accurately extracting key findings from papers or generating coherent experimental procedures), existing evaluation benchmarks primarily assess performance using rich contextual inputs. We introduce LiveIdeaBench, a comprehensive benchmark evaluating LLMs' scientific idea generation by assessing divergent thinking capabilities using single-keyword prompts. Drawing from Guilford's creativity theory, our benchmark employs a dynamic panel of state-of-the-art LLMs to assess generated ideas across five key dimensions: originality, feasibility, fluency, flexibility, and clarity. Through extensive experimentation with over 40 leading models across 1,180 keywords spanning 22 scientific domains, we reveal that the scientific idea generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

x66ccff/liveideabench
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsResearch Data Management Practices · Scientific Computing and Data Management · scientometrics and bibliometrics research