Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context
Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu, Hao Sun

TL;DR
This paper introduces LiveIdeaBench, a benchmark for evaluating LLMs' scientific idea generation from minimal prompts, revealing that current models' creativity is not well predicted by general intelligence metrics.
Contribution
The paper presents a novel benchmark for assessing LLMs' divergent thinking in scientific idea generation using single-keyword prompts, highlighting gaps in current evaluation methods.
Findings
Models like QwQ-32B-preview perform comparably to top models in idea generation.
Standard intelligence metrics poorly predict creative capabilities in scientific idea generation.
Enhancing idea generation may require different training strategies than those for general problem-solving.
Abstract
While Large Language Models (LLMs) demonstrate remarkable capabilities in scientific tasks such as literature analysis and experimental design (e.g., accurately extracting key findings from papers or generating coherent experimental procedures), existing evaluation benchmarks primarily assess performance using rich contextual inputs. We introduce LiveIdeaBench, a comprehensive benchmark evaluating LLMs' scientific idea generation by assessing divergent thinking capabilities using single-keyword prompts. Drawing from Guilford's creativity theory, our benchmark employs a dynamic panel of state-of-the-art LLMs to assess generated ideas across five key dimensions: originality, feasibility, fluency, flexibility, and clarity. Through extensive experimentation with over 40 leading models across 1,180 keywords spanning 22 scientific domains, we reveal that the scientific idea generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · scientometrics and bibliometrics research
