MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

Zonglin Yang; Wanhao Liu; Ben Gao; Yujie Liu; Wei Li; Tong Xie; Lidong Bing; Wanli Ouyang; Erik Cambria; Dongzhan Zhou

arXiv:2505.19209·cs.CL·October 28, 2025

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search

Zonglin Yang, Wanhao Liu, Ben Gao, Yujie Liu, Wei Li, Tong Xie, Lidong Bing, Wanli Ouyang, Erik Cambria, Dongzhan Zhou

PDF

TL;DR

This paper introduces a hierarchical search method for fine-grained scientific hypothesis discovery using LLMs, demonstrating improved hypothesis generation by optimizing the reward landscape through ensemble strategies.

Contribution

The paper formalizes the task of fine-grained hypothesis discovery, proposes a hierarchical search approach, and evaluates ensemble strategies to enhance LLM-based hypothesis generation.

Findings

01

Hierarchical search improves hypothesis quality.

02

Ensemble of diverse LLMs yields better results.

03

Method outperforms strong baselines on new benchmark.

Abstract

Large language models (LLMs) have shown promise in automating scientific hypothesis generation, yet existing approaches primarily yield coarse-grained hypotheses lacking critical methodological and experimental details. We introduce and formally define the new task of fine-grained scientific hypothesis discovery, which entails generating detailed, experimentally actionable hypotheses from coarse initial research directions. We frame this as a combinatorial optimization problem and investigate the upper limits of LLMs' capacity to solve it when maximally leveraged. Specifically, we explore four foundational questions: (1) how to best harness an LLM's internal heuristics to formulate the fine-grained hypothesis it itself would judge as the most promising among all the possible hypotheses it might generate, based on its own internal scoring-thus defining a latent reward landscape over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.