KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning
Qitan Lv, Tianyu Liu, Qiaosheng Zhang, Xingcheng Xu, Chaochao Lu

TL;DR
KALE is a post-training framework that uses knowledge graphs to generate rationales and improve large language models' ability to recall, reason, and transfer knowledge, addressing the known&incorrect phenomenon.
Contribution
KALE introduces a novel knowledge graph-based rationale generation and a rationale-guided fine-tuning method to enhance LLMs' knowledge manipulation capabilities.
Findings
Achieves up to 11.72% accuracy improvement on benchmarks.
Effectively internalizes rationales to improve reasoning.
Demonstrates general applicability across multiple LLMs.
Abstract
Despite the impressive performance of large language models (LLMs) pretrained on vast knowledge corpora, advancing their knowledge manipulation-the ability to effectively recall, reason, and transfer relevant knowledge-remains challenging. Existing methods mainly leverage Supervised Fine-Tuning (SFT) on labeled datasets to enhance LLMs' knowledge manipulation ability. However, we observe that SFT models still exhibit the known&incorrect phenomenon, where they explicitly possess relevant knowledge for a given question but fail to leverage it for correct answers. To address this challenge, we propose KALE (Knowledge-Aware LEarning)-a post-training framework that leverages knowledge graphs (KGs) to generate high-quality rationales and enhance LLMs' knowledge manipulation ability. Specifically, KALE first introduces a Knowledge-Induced (KI) data synthesis method that efficiently extracts…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The proposed knowledge-aware learning (KALE) framework, which integrates knowledge-induced data synthesis (KI) and knowledge-aware fine-tuning (KA), is novel and interesting. The rationale extraction process is well designed to enhance efficiency through the use of anchor entities and a three-step BFS strategy. 2. The experimental results demonstrate that the proposed KALE framework achieves notable performance improvements, and the ablation studies clearly illustrate the individual effects o
1. The extracted rationales are regarded as a form of Chain-of-Thought (CoT), but it remains unclear why KA is formulated based on the KL divergence between the generative distributions with and without rationales. What is the motivation for using KL divergence, compared to more conventional fine-tuning approaches that jointly generate both rationales and answers under an autoregressive loss? 2. From a data augmentation perspective, it is unclear why the generated dataset is relatively large com
1. The method elegantly integrates external structured knowledge with rationale-based learning, addressing the “known-but-incorrect” problem in LLMs. 2. The proposed model consistently improves across model scales (7B–32B), showing stable generality.
1. No evaluation on open-ended generation or general abilities. The paper focuses solely on accuracy in knowledge tasks and does not verify whether KALE harms general fluency or creativity after fine-tuning. 2. No comparison with modern reasoning or “thinking-style” models. Baselines (ToG, StructGPT, GraphRAG) are early models and do not include current SOTA models like DeepSeek-R1, Qwen2.5-Think, or Llama3 thinking model. Hence, the claimed “SOTA” results may be overstated.
1. The paper's problem definition is clear and significant. The "known&incorrect" phenomenon is a key pain point in LLM research. The authors clearly articulate this problem with illustrative cases, providing a strong motivation. 2. The paper proposes a novel data generation framework (KI) that combines KG paths with LLM generation. This offers a systematic method for creating high-quality reasoning data with a clear logical basis. 3. The authors conducted extensive experiments on 8 benchmarks
1. The attribution of efficacy for the KI synthesis stage, a core contribution in this paper, is severely confounded. The process is critically dependent on a powerful, SOTA proprietary model (GPT-4o) to "translate" KG paths into "high-quality" rationales. This makes it difficult to discern if the performance gains stem from the KALE framework's superiority or simply from distilling a stronger "teacher" model. The authors' own results in Appendix Q (Table 18) amplify this concern: using a weaker
1. This paper formalizes the “known & incorrect” gap and empirically shows this failure mode remains common after SFT, which shows a clear problem framing and strong motivation. 2. The paper uses external KGs to extract multi-hop reasoning paths → generate textual rationales (KI), then minimize KL divergence between outputs with/without rationales for knowledge-aware fine-tuning (KA), so the model can retrieve relevant knowledge even when no rationale is provided at inference. The pipeline is co
1. All experiments fine-tune on each benchmark’s training set separately. This setup resembles “task-specific adaptation” rather than evaluating cross-task generalization. 2. While the authors emphasize no extra inference-time cost, training includes: (1) path extraction from large KGs (still requires full preprocessing, though faster than BFS), (2) GPT-4o calls for rationale generation (API cost and reproducibility issues), and (3) KL-based consistency training (dual forward passes). Combined,
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
