MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Zonglin Yang; Wanhao Liu; Ben Gao; Tong Xie; Yuqiang Li; Wanli Ouyang; Soujanya Poria; Erik Cambria; Dongzhan Zhou

arXiv:2410.07076·cs.CL·October 28, 2025·3 cites

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Zonglin Yang, Wanhao Liu, Ben Gao, Tong Xie, Yuqiang Li, Wanli Ouyang, Soujanya Poria, Erik Cambria, Dongzhan Zhou

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces MOOSE-Chem, an LLM-based framework for autonomous discovery of novel, high-quality chemistry hypotheses by decomposing the process into retrieval, composition, and ranking tasks, validated on a new benchmark.

Contribution

The work presents a formal mathematical decomposition of hypothesis discovery and implements it through the MOOSE-Chem framework, demonstrating LLMs' ability to rediscover hypotheses without data contamination.

Findings

01

LLMs can effectively retrieve inspirations for hypotheses.

02

MOOSE-Chem successfully rediscovered core hypotheses from recent high-impact papers.

03

LLMs may encode latent scientific knowledge not yet recognized by humans.

Abstract

Scientific discovery plays a pivotal role in advancing human society, and recent progress in large language models (LLMs) suggests their potential to accelerate this process. However, it remains unclear whether LLMs can autonomously generate novel and valid hypotheses in chemistry. In this work, we investigate whether LLMs can discover high-quality chemistry hypotheses given only a research background-comprising a question and/or a survey-without restriction on the domain of the question. We begin with the observation that hypothesis discovery is a seemingly intractable task. To address this, we propose a formal mathematical decomposition grounded in a fundamental assumption: that most chemistry hypotheses can be composed from a research background and a set of inspirations. This decomposition leads to three practical subtasks-retrieving inspirations, composing hypotheses with…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Originality: Firstly, while LLMs have been utilized for scientific discovery in social science and NLP, this paper is the first to investigate their potential in chemistry. Besides, The MOOSE-CHEM framework employs a three-step approach to retrieve inspiration papers, inference valid knowledge, identify hypotheses and rank them, which hasn’t been used in previous research. Moreover, the use of the evolutionary algorithm to foster a broader diversity in hypothesis generation is also an innovati

Weaknesses

Firstly, using the same large language model to evaluate its own generated results may introduce bias. It is recommended to try using different LLMs to evaluate the results so as to guarantee the reliability of the results. For example, consider using models like LLaMa[1], Claude[2], Gemini[3], or other recent LLMs to compare outputs. If using the same LLM is necessary, you could collect hypotheses generated by humans and also have both experts and GPT-4 evaluate them. Then, compare their Hard/S

Reviewer 02Rating 5Confidence 4

Strengths

1. Generating research hypotheses is a complicated task, and the authors heuristically decomposed hypothesis generation into two steps: (1). inspiration retrieval, and (2). hypothesis refinement. In the hypothesis refinement step, the authors propose a novel “mutate and recombine” trick to help generate good hypotheses. 2. The experiments to verify each of the research questions are well-designed with good quality.

Weaknesses

1. The introduction section could be written better and more clear. (a) It would be great if the authors could provide a summary of the major contributions of this work at the end of the introduction section. What are really the contribution to the field? (b) It would be great if the authors could briefly discuss why the decomposition of the major question is necessary, what’s the difference or connection between the proposed inspiration identification (the first step of the three) and Retrieval

Reviewer 03Rating 8Confidence 4

Strengths

The paper is generally well-written, and in good English. The text is clear and the authors did a good job guiding the reader through the motivation, the derivation of the method and motivating each of the proposed steps and experiments. The topic of the paper is very relevant and the results are positive. Related work is well covered, and experiments are included that compare the proposed method with previous work. Every claim made on the performance of the method is generally backed up with e

Weaknesses

My main concerns with the paper are regarding the reproducibility, clarity and discussion of the approach: - Reproducibility: we note that the authors introduce a scientific benchmark, along with a novel framework for hypothesis generation. However, the authors do not provide access to the novel-introduced benchmark, which hampers the ability to really discriminate the difficulty of the tasks at hand. Additionally, this impedes the ability to reproduce the results or for future work to compare

Code & Models

Repositories

ZonglinY/MOOSE-Chem
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Advanced Text Analysis Techniques

MethodsSparse Evolutionary Training