Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
Yuval Kansal, Niraj K. Jha

TL;DR
This paper introduces a novel training paradigm where knowledge graphs serve as implicit reward models, enabling language models to improve multi-hop reasoning by grounding in structured facts and deriving new reward signals, especially in scientific domains.
Contribution
The authors propose a bottom-up learning approach using knowledge graph-derived rewards, combining supervised fine-tuning and reinforcement learning to enhance compositional reasoning in language models.
Findings
Model outperforms larger models like GPT-5.2 and Gemini 3 Pro on complex reasoning tasks.
Path-derived rewards improve zero-shot generalization to multi-hop queries.
Approach demonstrates robustness against adversarial option-shuffling tests.
Abstract
Large language models have achieved near-expert performance in structured reasoning domains like mathematics and programming, yet their ability to perform compositional multi-hop reasoning in specialized scientific fields remains limited. We propose a bottom-up learning paradigm in which models are grounded in axiomatic domain facts and compose them to solve complex, unseen tasks. To this end, we present a post-training pipeline, based on a combination of supervised fine-tuning and reinforcement learning (RL), in which knowledge graphs act as implicit reward models. By deriving novel reward signals from knowledge graph paths, we provide verifiable, scalable, and grounded supervision that encourages models to compose intermediate axioms rather than optimize only final answers during RL. We validate this approach in the medical domain, training a 14B model on short-hop reasoning paths…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling
