MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Zonglin Yang; Lidong Bing

arXiv:2603.03756·cs.LG·May 12, 2026

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Zonglin Yang, Lidong Bing

PDF

1 Repo 3 Models 2 Datasets

TL;DR

MOOSE-Star introduces a scalable framework for scientific discovery modeling that reduces complexity from exponential to logarithmic, enabling efficient training and inference in large knowledge bases.

Contribution

It presents a novel approach that makes training $P(h|b)$ tractable and scalable by decomposing tasks, hierarchical search, and bounded composition, supported by a large dataset.

Findings

01

MOOSE-Star scales with training data and inference budget.

02

Direct brute-force sampling faces a complexity wall.

03

The framework achieves logarithmic complexity in retrieval.

Abstract

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P (hypothesis ∣ background)$ ( $P (h ∣ b)$ ), unexplored. We demonstrate that directly training $P (h ∣ b)$ is mathematically intractable due to the combinatorial complexity ( $O (N^{k})$ ) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework that enables tractable and scalable training of $P (h ∣ b)$ , while supporting more scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic ( $O (lo g N)$ ) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zongliny/MOOSE-Star
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.