LLM Jaggedness Unlocks Scientific Creativity
Shray Mathur, J. Anibal Boscoboinik, Esther H. R. Tsai, and Kevin G. Yager

TL;DR
This paper investigates the uneven progress of large language models in scientific idea generation, introduces a benchmark for measuring scientific creativity, and demonstrates how leveraging model jaggedness can enhance scientific innovation.
Contribution
The work introduces SciAidanBench, a benchmark for scientific creativity, and shows how understanding model jaggedness can be used to improve AI-driven scientific idea generation.
Findings
Jaggedness manifests across models, tasks, and domains.
Stronger models show high variability in scientific creativity.
Combining models via inference-time strategies outperforms individual models.
Abstract
As artificial intelligence advances, models are not improving uniformly. Instead, progress unfolds in a jagged fashion, with capabilities growing unevenly across tasks, domains, and model scales. In this work, we examine this dynamic jaggedness through the lens of scientific idea generation. We introduce SciAidanBench, a benchmark of open-ended scientific questions designed to measure the scientific creativity of large language models (LLMs). Given a scientific question, models are asked to generate as many unique and coherent ideas as possible, with the total number of valid responses serving as a proxy for creative potential. Evaluating 19 base models across 8 providers (30 total variants including reasoning versions), we find that jaggedness manifests both across models and within models. First, in a cross-task comparison between general and scientific creativity, improvements in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
