TL;DR
Graph-R1 introduces an end-to-end reinforcement learning framework for knowledge retrieval in RAG systems, improving reasoning, efficiency, and generation quality by modeling retrieval as a multi-turn agent-environment interaction.
Contribution
It presents a novel agentic GraphRAG framework that uses reinforcement learning to optimize knowledge retrieval and reasoning in RAG models, addressing previous limitations.
Findings
Outperforms traditional GraphRAG in reasoning accuracy.
Enhances retrieval efficiency and generation quality.
Demonstrates effectiveness on standard RAG datasets.
Abstract
Retrieval-Augmented Generation (RAG) mitigates hallucination in LLMs by incorporating external knowledge, but relies on chunk-based retrieval that lacks structural semantics. GraphRAG methods improve RAG by modeling knowledge as entity-relation graphs, but still face challenges in high construction cost, fixed one-time retrieval, and reliance on long-context reasoning and prompt design. To address these challenges, we propose Graph-R1, an agentic GraphRAG framework via end-to-end reinforcement learning (RL). It introduces lightweight knowledge hypergraph construction, models retrieval as a multi-turn agent-environment interaction, and optimizes the agent process via an end-to-end reward mechanism. Experiments on standard RAG datasets show that Graph-R1 outperforms traditional GraphRAG and RL-enhanced RAG methods in reasoning accuracy, retrieval efficiency, and generation quality.
Peer Reviews
Decision·Submitted to ICLR 2026
1. RL post-training with a multi-turn query and retrieval design makes sense. 2. The empirical studies are comprehensive. 3. The paper is well written and nicely illustrated.
1. The technical contributions seem limited: multi-turn action loop + GRPO + downstream task accuracy reward + format correctness reward are not particularly new. The authors could further clarify the core contributions. 2. The propositions appear decorative rather than informative. 3. I am curious about how the knowledge hypergraphs are built. It is not entirely clear to me from Sec. 2.1. Could the authors further clarify how this step is performed? 4. I am also interested in the generalization
1. The proposed Graph-R1 leverage LLMs as agent to interact with graph, which solve the issues of finxed retrieval process with only one-time interaction. 2. The experiments are conducted on multiple datasets and include recent baselines. 3. The authors also conduct detaield ablation studies and generalization comparision.
1. There is another work GraphRAG-R1 [1], which also use RL to solve the GraphRAG issues. The authors may compare the difference between these two methods. 2. What is the motivation to use Knowledge HyperGraph intead of KG? [1] Yu, Chuanyue, et al. "GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning." arXiv preprint arXiv:2507.23581 (2025).
- The paper is well writing. - The paper did detailed empirical experiments across multiple datasets and also provide insightful case studies.
- The research novelty is quite limited. This work applies GRPO to GraphRAG scenario. The most techniques (GRPO, GraphRAG etc.) utilized in this work is already extensively studied. Basically this paper only proves that GRPO can work in the GraphRAG scenario. - The proposed OOD setting is not enough to evaluate the generalization ability of the finetuned LLMs. Many datasets such as 2wiki and hotpotqa are all wiki-based datasets. The paper need to evaluate on other domain of data to show the ge
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
