MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs
Guojiang Zhao, Zixiang Lu, Yutang Ge, Sihang Li, Zheng Cheng, Haitao Lin, Lirong Wu, Hanchen Xia, Hengxing Cai, Wentao Guo, Hongshuai Wang, Mingjun Xu, Siyu Zhu, Guolin Ke, Linfeng Zhang, Zhifeng Gao

TL;DR
MolReasoner is a two-stage framework that enhances the reasoning capabilities of Large Language Models in molecular tasks by combining knowledge-enhanced training and a novel reward system, leading to more accurate and interpretable molecular reasoning.
Contribution
The paper introduces MolReasoner, a novel two-stage framework that improves molecular reasoning in LLMs through knowledge-enhanced training and task-adaptive refinement, addressing hallucinations and interpretability.
Findings
Outperforms baseline models in molecule generation and captioning
Produces more interpretable and accurate molecular reasoning outputs
Effectively reduces hallucinations in LLM-based molecular tasks
Abstract
Large Language Models (LLMs) have shown impressive performance across various domains, but their ability to perform molecular reasoning remains underexplored. Existing methods mostly rely on general-purpose prompting, which lacks domain-specific molecular semantics, or fine-tuning, which faces challenges in interpretability and reasoning depth, often leading to structural and textual hallucinations. To address these issues, we introduce MolReasoner, a two-stage framework that transitions LLMs from memorization to high-fidelity chemical reasoning. In the Mol-SFT stage, knowledge-enhanced Chain-of-Thought (CoT) data provides a strong foundation, while the Mol-RL stage refines reasoning using a novel, task-adaptive reward system to mitigate hallucinations. Extensive evaluations demonstrate that MolReasoner significantly outperforms a wide range of strong baselines in both molecule…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The authors proposes simple and intuitive framework for molecular reasoning. - The figure effectively increases the understandability of the paper.
- **Wrong prior work**: The authors’ discussion of prior work [2] (line 157) appears inaccurate. Although the paper title seems relevant to *structural information reasoning*, the cited work actually focuses on *structured reasoning for chemical equations*, not molecular structure-based reasoning. In addition, a minor but important issue is that the authors list **MSR [1]** and another **Jang et al.** paper (also mentioned at line 157) as distinct works, whereas they are in fact the *same paper*
The paper addresses a genuine limitation in molecular LLMs—the tendency toward memorization rather than structured reasoning. The distinction between prompt-based methods (lacking domain adaptation), fine-tuning without explicit reasoning (lacking interpretability), and the proposed reasoning-enhanced approach is clearly articulated (Figure 1). The two-stage pipeline (Mol-SFT then Mol-RL) follows a logical progression: warm-up with synthetic CoT data establishes reasoning format, then RL refines
The paper's core contribution is applying existing techniques (CoT distillation + RLHF/GRPO) to molecular tasks, this is incremental engineering on a downstream task rather than methodological innovation. CoT distillation from teacher models (GPT-4o) has been extensively explored in reasoning literature, and GRPO is an established RL algorithm. The comparison with Mol-Instructions (Table 1) is misleading because Mol-Instructions uses final-answer-only supervision, while MolReasoner uses 42,000
1. It is practically meaningful to explore molecular reasoning in LLMs. 2. The authors propose novel metrics (e.g., Frag-J, FG-Match), which can capture structural hallucinations beyond validity. 3. The performance is much better than the selected baselines.
1. The framework is not novel enough. It seems an application of the R1 framework in molecular reasoning. 2. The proposed multi-level molecular rewards actually reward any molecules, which may result in poor performance of generation. 3. Poor baselines. More reasoning LLMs could be included. For example, QWQ and the original version of DeepSeek-R1 should be compared. 4. The ablation study is not convincing. The authors could discuss more about the benefits of the RL stage. 5. The entire framewor
- **Important problem**: Interpretable chemical reasoning is crucial for trustworthy molecular AI applications - **Well-written**: Clear motivation and comprehensive related work section - **Reasonable approach**: Two-stage SFT→RL pipeline aligns with recent successful paradigms - **Domain adaptation**: Incorporating structural features and functional groups into CoT generation is appropriate - **Expert involvement**: Chemistry experts used in evaluation (Section 3.4), though details are limited
**Limited novelty**: Training LLMs to reason on chemistry has been shown by Ether0 using the same SFT→RL strategy [6]. This approach (CoT distillation + GRPO) has been applied by DeepSeek and others. The contribution is primarily dataset engineering and reward design, not methodological innovation. **Flawed motivation (L46-51)**: Claims that general LLMs can't handle molecules lack empirical support and contradict evidence from recent benchmarks (ChemBench [1], ChemEval [2], ChemIQ [3], GPQA ch
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
