Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

Deyu Zou; Yongqiang Chen; Mufei Li; Siqi Miao; Chenxi Liu; Bo Han; James Cheng; Pan Li

arXiv:2506.22518·cs.CL·July 1, 2025

Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation

Deyu Zou, Yongqiang Chen, Mufei Li, Siqi Miao, Chenxi Liu, Bo Han, James Cheng, Pan Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces ReG, a method that aligns weak graph retrievers with large language models to improve graph-based retrieval-augmented generation, reducing hallucinations and enhancing performance across benchmarks.

Contribution

ReG leverages LLM feedback and a structure-aware module to refine weak retrievers, significantly improving retrieval quality and model performance with less training data.

Findings

01

ReG improves performance by up to 10% on benchmarks.

02

ReG enables matching state-of-the-art with 5% training data.

03

ReG reduces reasoning token cost by up to 30%.

Abstract

Graph-based retrieval-augmented generation (RAG) enables large language models (LLMs) to ground responses with structured external knowledge from up-to-date knowledge graphs (KGs) and reduce hallucinations. However, LLMs often rely on a weak retriever in graph-based RAG: I) Due to the lack of ground truth, the retriever is often trained on weak supervision, which often introduces spurious signals to the LLMs. II) Due to the abstraction of graph data, the retrieved knowledge is often presented in unorganized forms. To mitigate the issue, we present Refined Graph-based RAG (ReG) to align weak retrievers to LLMs for graph-based RAG. Specifically, ReG incorporates LLM feedback to get rid of spurious signals and improve the quality of the supervision. Meanwhile, ReG introduces a structure-aware reorganization module to refactor the retrieval results into logically coherent evidence chains.…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 5

Strengths

1. Tackles an important and timely problem. 2. Thorough experimental evaluation, including an analysis of the proposed “overthinking” problem. 3. Maintains high performance even when trained on just 5% of the data.

Weaknesses

1. Some parts of the text are hard to follow and would benefit from rewriting for clarity. 2. Key methodological details are omitted and should be added. 3. The use of LLM to evaluate each candidate P is expensive.

Reviewer 02Rating 2Confidence 5

Strengths

- It solves a practical problem since heuristic-based supervision like shortest paths for graph retrievers is noisy and misaligned with LLM reasoning. - The results are promising. The method demonstrates strong performance gains on traditional KGQA datasets (WebQSP, CWQ) and shows impressive data efficiency, matching SOTA performance with only 5% of training data.

Weaknesses

- The title and claims are misleading. This is indeed an LLM-based KGQA paper, not GraphRAG. The community refers GraphRAG as a complete pipeline, similar to but more than RAG, that involves constructing a graph from raw documents and then retrieving from it. This work operates purely on existing KGs in a traditional LLM-based KGQA setting, let lone not comparing against real GraphRAG baselines, the paper addresses a much narrower problem than it claims. - It is a very important prerequi

Reviewer 03Rating 6Confidence 3

Strengths

The paper clearly articulates the limitations of current graph-based RAG approaches. The method also achieves state-of-the-art performance across benchmarks.

Weaknesses

1. Limited novelty in core techniques: While the combination is effective, the individual components are relatively standard. Using LLM feedback to filter/refine candidates is not new (acknowledged in related work). BFS-based chain expansion is a straightforward graph traversal technique. The main contribution appears to be the specific application to graph-based RAG rather than methodological innovation 2. No analysis of retrieval quality independent of QA performance (e.g., precision/recall of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications