MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
Zonglin Yang, Lidong Bing

TL;DR
MOOSE-Star introduces a scalable framework for scientific discovery modeling that reduces complexity from exponential to logarithmic, enabling efficient training and inference in large knowledge bases.
Contribution
It presents a novel approach that makes training $P(h|b)$ tractable and scalable by decomposing tasks, hierarchical search, and bounded composition, supported by a large dataset.
Findings
MOOSE-Star scales with training data and inference budget.
Direct brute-force sampling faces a complexity wall.
The framework achieves logarithmic complexity in retrieval.
Abstract
While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, (), unexplored. We demonstrate that directly training is mathematically intractable due to the combinatorial complexity () inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework that enables tractable and scalable training of , while supporting more scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic () by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
