LLM Jaggedness Unlocks Scientific Creativity

Shray Mathur; J. Anibal Boscoboinik; Esther H. R. Tsai; and Kevin G. Yager

arXiv:2605.10574·cs.AI·May 21, 2026

LLM Jaggedness Unlocks Scientific Creativity

Shray Mathur, J. Anibal Boscoboinik, Esther H. R. Tsai, and Kevin G. Yager

PDF

TL;DR

This paper investigates the uneven progress of large language models in scientific idea generation, introduces a benchmark for measuring scientific creativity, and demonstrates how leveraging model jaggedness can enhance scientific innovation.

Contribution

The work introduces SciAidanBench, a benchmark for scientific creativity, and shows how understanding model jaggedness can be used to improve AI-driven scientific idea generation.

Findings

01

Jaggedness manifests across models, tasks, and domains.

02

Stronger models show high variability in scientific creativity.

03

Combining models via inference-time strategies outperforms individual models.

Abstract

As artificial intelligence advances, models are not improving uniformly. Instead, progress unfolds in a jagged fashion, with capabilities growing unevenly across tasks, domains, and model scales. In this work, we examine this dynamic jaggedness through the lens of scientific idea generation. We introduce SciAidanBench, a benchmark of open-ended scientific questions designed to measure the scientific creativity of large language models (LLMs). Given a scientific question, models are asked to generate as many unique and coherent ideas as possible, with the total number of valid responses serving as a proxy for creative potential. Evaluating 19 base models across 8 providers (30 total variants including reasoning versions), we find that jaggedness manifests both across models and within models. First, in a cross-task comparison between general and scientific creativity, improvements in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.