Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning

Tianwen Lyu; Xiang Zhuang; Keyan Ding; Xinzhe Cao; Lei Liang; Wei Zhao; Qiang Zhang; Huajun Chen

arXiv:2511.08024·cs.AI·November 12, 2025

Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning

Tianwen Lyu, Xiang Zhuang, Keyan Ding, Xinzhe Cao, Lei Liang, Wei Zhao, Qiang Zhang, Huajun Chen

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a knowledge-augmented reasoning framework that enhances large language models' ability to perform complex, multi-step biomolecular reasoning by integrating knowledge graphs and a new comprehensive benchmark.

Contribution

It proposes a novel framework combining LLMs with knowledge graph-based multi-hop reasoning and introduces PrimeKGQA, a new benchmark for biomolecular question answering.

Findings

01

Outperforms existing methods on multi-hop biomolecular reasoning tasks.

02

Demonstrates improved factual grounding and reasoning consistency.

03

Achieves state-of-the-art results on PrimeKGQA and related datasets.

Abstract

Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. Problem formulation and motivation are strong - The authors clearly identify the gap in existing CoT reasoning methods for biomedical QA tasks, that is, the lack of factual grounding and demonstrate how Bio-KCoT can function as a bridge. 2. The paper presents a thorough and well-structured pipeline - entity extraction, KG-based path retrieval, CoT generation, pruning, and RL fine-tuning. 3. Novel dataset contribution - The introduction of PrimeKGQA is a meaningful resource for the communit

Weaknesses

1. Possible data leakage concerns - The dataset is derived from publicly available biomedical sources ( PrimeKG). It remains unclear whether this data maybe used in the base model pretraining and might bias evaluation. 2. Limited empirical improvement over large models - Despite the novel training pipeline, Bio-KCoT’s absolute performance remains very limited compared with much larger closed-source models (e.g., GPT-4o) on average accuracy. 3. Underexplored multimodal integration - The frame

Reviewer 02Rating 4Confidence 3

Strengths

Clear, well-motivated integration of structured knowledge with long-CoT: KG-guided path instantiation and pruning directly address factual grounding and logical consistency issues common in biomolecular LLM reasoning. Methodological novelty in combining: (i) multi-structure path templates (linear/divergent/convergent) capturing non-local evidence, (ii) prompt-based CoT construction aligned to KG paths, (iii) pruning to remove spurious steps, and (iv) GRPO with a simple, effective composite rewa

Weaknesses

Insufficient description of dataset construction. The paper does not detail how question and answer entities in PrimeKGQA are extracted and disambiguated, which templates and constraints are used to instantiate paths from the knowledge graph, or the concrete criteria for subsequent filtering and cleaning. Fallback mechanisms and quality-control standards to ensure the validity and correctness of constructed questions are also unspecified. Dataset quality and potential leakage are not systematic

Reviewer 03Rating 4Confidence 4

Strengths

1. Important Problem Domain: Biomolecular reasoning is a critical application area where factual accuracy and structured knowledge integration are essential. The motivation to address hallucinations and logical inconsistencies in this high-stakes domain is well-founded. 2. Comprehensive Framework Design: The three-stage approach (KG path extraction → CoT generation → pruning) is systematic and well-motivated. The integration of structured knowledge with long-form reasoning addresses real limita

Weaknesses

1. My main concern is about benchmark construction: PrimeKGQA is constructed from PrimeKG which integrates 20 high-quality biomedical resources to describe 17,080 diseases with 4,050,249 relationships, but the benchmark creation is template-based where questions are automatically generated from head-relation-tail triples. This raises several concerns: - Artificial Task Design: The problems may not truly require complex KG reasoning or long CoT. Many could potentially be answered with paramete

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning in Bioinformatics · Topic Modeling