SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
Jiujiu Chen, Yazheng Liu, Sihong Xie, Hui Xiong

TL;DR
This paper introduces SCPRM, a schema-aware reward model for knowledge graph reasoning that improves evaluation accuracy by considering reasoning paths and schema distances, enhancing multi-hop question answering.
Contribution
The paper proposes SCPRM, a novel schema-aware cumulative reward model that addresses limitations of existing process reward models in knowledge graph reasoning tasks.
Findings
SCPRM improves Hits@k by 1.18% on average over baselines.
Incorporating schema distance enhances reasoning path evaluation.
SCPRM-MCTS achieves more accurate, risk-sensitive reasoning in KGQA.
Abstract
Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
