Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning

Haoran Luo; Haihong E; Guanting Chen; Qika Lin; Yikai Guo; Fangzhi Xu; Zemin Kuang; Meina Song; Xiaobao Wu; Yifan Zhu; Luu Anh Tuan

arXiv:2507.21892·cs.CL·July 30, 2025

Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning

Haoran Luo, Haihong E, Guanting Chen, Qika Lin, Yikai Guo, Fangzhi Xu, Zemin Kuang, Meina Song, Xiaobao Wu, Yifan Zhu, Luu Anh Tuan

PDF

3 Reviews

TL;DR

Graph-R1 introduces an end-to-end reinforcement learning framework for knowledge retrieval in RAG systems, improving reasoning, efficiency, and generation quality by modeling retrieval as a multi-turn agent-environment interaction.

Contribution

It presents a novel agentic GraphRAG framework that uses reinforcement learning to optimize knowledge retrieval and reasoning in RAG models, addressing previous limitations.

Findings

01

Outperforms traditional GraphRAG in reasoning accuracy.

02

Enhances retrieval efficiency and generation quality.

03

Demonstrates effectiveness on standard RAG datasets.

Abstract

Retrieval-Augmented Generation (RAG) mitigates hallucination in LLMs by incorporating external knowledge, but relies on chunk-based retrieval that lacks structural semantics. GraphRAG methods improve RAG by modeling knowledge as entity-relation graphs, but still face challenges in high construction cost, fixed one-time retrieval, and reliance on long-context reasoning and prompt design. To address these challenges, we propose Graph-R1, an agentic GraphRAG framework via end-to-end reinforcement learning (RL). It introduces lightweight knowledge hypergraph construction, models retrieval as a multi-turn agent-environment interaction, and optimizes the agent process via an end-to-end reward mechanism. Experiments on standard RAG datasets show that Graph-R1 outperforms traditional GraphRAG and RL-enhanced RAG methods in reasoning accuracy, retrieval efficiency, and generation quality.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. RL post-training with a multi-turn query and retrieval design makes sense. 2. The empirical studies are comprehensive. 3. The paper is well written and nicely illustrated.

Weaknesses

1. The technical contributions seem limited: multi-turn action loop + GRPO + downstream task accuracy reward + format correctness reward are not particularly new. The authors could further clarify the core contributions. 2. The propositions appear decorative rather than informative. 3. I am curious about how the knowledge hypergraphs are built. It is not entirely clear to me from Sec. 2.1. Could the authors further clarify how this step is performed? 4. I am also interested in the generalization

Reviewer 02Rating 6Confidence 4

Strengths

1. The proposed Graph-R1 leverage LLMs as agent to interact with graph, which solve the issues of finxed retrieval process with only one-time interaction. 2. The experiments are conducted on multiple datasets and include recent baselines. 3. The authors also conduct detaield ablation studies and generalization comparision.

Weaknesses

1. There is another work GraphRAG-R1 [1], which also use RL to solve the GraphRAG issues. The authors may compare the difference between these two methods. 2. What is the motivation to use Knowledge HyperGraph intead of KG? [1] Yu, Chuanyue, et al. "GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning." arXiv preprint arXiv:2507.23581 (2025).

Reviewer 03Rating 4Confidence 3

Strengths

- The paper is well writing. - The paper did detailed empirical experiments across multiple datasets and also provide insightful case studies.

Weaknesses

- The research novelty is quite limited. This work applies GRPO to GraphRAG scenario. The most techniques (GRPO, GraphRAG etc.) utilized in this work is already extensively studied. Basically this paper only proves that GRPO can work in the GraphRAG scenario. - The proposed OOD setting is not enough to evaluate the generalization ability of the finetuned LLMs. Many datasets such as 2wiki and hotpotqa are all wiki-based datasets. The paper need to evaluate on other domain of data to show the ge

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.