Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning
Tianwen Lyu, Xiang Zhuang, Keyan Ding, Xinzhe Cao, Lei Liang, Wei Zhao, Qiang Zhang, Huajun Chen

TL;DR
This paper introduces a knowledge-augmented reasoning framework that enhances large language models' ability to perform complex, multi-step biomolecular reasoning by integrating knowledge graphs and a new comprehensive benchmark.
Contribution
It proposes a novel framework combining LLMs with knowledge graph-based multi-hop reasoning and introduces PrimeKGQA, a new benchmark for biomolecular question answering.
Findings
Outperforms existing methods on multi-hop biomolecular reasoning tasks.
Demonstrates improved factual grounding and reasoning consistency.
Achieves state-of-the-art results on PrimeKGQA and related datasets.
Abstract
Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Problem formulation and motivation are strong - The authors clearly identify the gap in existing CoT reasoning methods for biomedical QA tasks, that is, the lack of factual grounding and demonstrate how Bio-KCoT can function as a bridge. 2. The paper presents a thorough and well-structured pipeline - entity extraction, KG-based path retrieval, CoT generation, pruning, and RL fine-tuning. 3. Novel dataset contribution - The introduction of PrimeKGQA is a meaningful resource for the communit
1. Possible data leakage concerns - The dataset is derived from publicly available biomedical sources ( PrimeKG). It remains unclear whether this data maybe used in the base model pretraining and might bias evaluation. 2. Limited empirical improvement over large models - Despite the novel training pipeline, Bio-KCoT’s absolute performance remains very limited compared with much larger closed-source models (e.g., GPT-4o) on average accuracy. 3. Underexplored multimodal integration - The frame
Clear, well-motivated integration of structured knowledge with long-CoT: KG-guided path instantiation and pruning directly address factual grounding and logical consistency issues common in biomolecular LLM reasoning. Methodological novelty in combining: (i) multi-structure path templates (linear/divergent/convergent) capturing non-local evidence, (ii) prompt-based CoT construction aligned to KG paths, (iii) pruning to remove spurious steps, and (iv) GRPO with a simple, effective composite rewa
Insufficient description of dataset construction. The paper does not detail how question and answer entities in PrimeKGQA are extracted and disambiguated, which templates and constraints are used to instantiate paths from the knowledge graph, or the concrete criteria for subsequent filtering and cleaning. Fallback mechanisms and quality-control standards to ensure the validity and correctness of constructed questions are also unspecified. Dataset quality and potential leakage are not systematic
1. Important Problem Domain: Biomolecular reasoning is a critical application area where factual accuracy and structured knowledge integration are essential. The motivation to address hallucinations and logical inconsistencies in this high-stakes domain is well-founded. 2. Comprehensive Framework Design: The three-stage approach (KG path extraction → CoT generation → pruning) is systematic and well-motivated. The integration of structured knowledge with long-form reasoning addresses real limita
1. My main concern is about benchmark construction: PrimeKGQA is constructed from PrimeKG which integrates 20 high-quality biomedical resources to describe 17,080 diseases with 4,050,249 relationships, but the benchmark creation is template-based where questions are automatically generated from head-relation-tail triples. This raises several concerns: - Artificial Task Design: The problems may not truly require complex KG reasoning or long CoT. Many could potentially be answered with paramete
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Bioinformatics · Topic Modeling
