Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models

Runxuan Liu; Xianhao Ou; Xinyan Ma; Jiyuan Wang; Jiafeng Liang; Jiaqi Li; Tao He; Zheng Chu; Rongchuan Mu; Zekun Wang; Baoxin Wang; Dayong Wu; Ming Liu; Shijin Wang; Guoping Hu; Bing Qin

arXiv:2601.12995·cs.CL·January 21, 2026

Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models

Runxuan Liu, Xianhao Ou, Xinyan Ma, Jiyuan Wang, Jiafeng Liang, Jiaqi Li, Tao He, Zheng Chu, Rongchuan Mu, Zekun Wang, Baoxin Wang, Dayong Wu, Ming Liu, Shijin Wang, Guoping Hu, Bing Qin

PDF

Open Access

TL;DR

This paper introduces the Graph Reasoning Paradigm (GRP), a structured, symbolic reasoning framework using graph representations and reinforcement learning to improve reasoning in large language models, addressing current limitations like coarse supervision and reward hacking.

Contribution

The paper proposes GRP, a novel structured reasoning approach with step-level labels and graph-based evaluation, enhancing reasoning accuracy and efficiency in LLMs.

Findings

01

Significant improvements in mathematical reasoning tasks

02

Enhanced code generation performance

03

Reduced reward hacking and training costs

Abstract

Long Chain-of-Thought (LCoT), achieved by Reinforcement Learning with Verifiable Rewards (RLVR), has proven effective in enhancing the reasoning capabilities of Large Language Models (LLMs). However, reasoning in current LLMs is primarily generated as plain text, where performing semantic evaluation on such unstructured data creates a computational bottleneck during training. Despite RLVR-based optimization, existing methods still suffer from coarse-grained supervision, reward hacking, high training costs, and poor generalization. To address these issues, we propose the Graph Reasoning Paradigm (GRP), which realizes structured and symbolic reasoning, implemented via graph-structured representations with step-level cognitive labels. Building upon GRP, we further design Process-Aware Stratified Clipping Group Relative Policy Optimization (PASC-GRPO), which leverages structured evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling