MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

Wanhao Liu; Zonglin Yang; Jue Wang; Lidong Bing; Di Zhang; Dongzhan Zhou; Yuqiang Li; Houqiang Li; Erik Cambria; Wanli Ouyang

arXiv:2505.17873·cs.CL·October 28, 2025

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang

PDF

1 Repo

TL;DR

This paper introduces a novel experiment-guided hypothesis ranking method using a domain-specific simulator and in-context reinforcement learning, significantly improving over pre-experiment ranking baselines in scientific discovery tasks.

Contribution

The paper presents a new framework combining a simulated experimental feedback system with ICRL to enhance hypothesis ranking in scientific discovery, addressing the limitations of prior pre-experiment methods.

Findings

01

The simulator closely mimics real experimental results with consistent trend alignment.

02

The ICRL-based policy outperforms baseline ranking methods significantly.

03

The toolkit enables systematic research on experiment-guided hypothesis ranking.

Abstract

Hypothesis ranking is vital for automated scientific discovery, especially in cost-intensive, throughput-limited natural science domains. Current methods focus on pre-experiment ranking, relying solely on language model reasoning without empirical feedback. We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests. Due to the impracticality of real experiments, we propose a simulator grounded in domain-specific concepts that models hypothesis performance as a function of similarity to a hidden ground truth, perturbed by noise. Validated against 124 hypotheses with experimentally reported outcomes, the simulator approximates real results with consistent trend alignment. Although deviations exist, they mimic wet-lab noise, promoting more robust ranking strategies. We frame experiment-guided ranking as a sequential decision-making problem and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanhaoliu/chemsimx
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus