Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation

Ziyu Liu; Yijing Liu; Jianfei Yuan; Minzhi Yan; Le Yue; Honghui Xiong; Yi Yang

arXiv:2510.24120·cs.LG·October 29, 2025

Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation

Ziyu Liu, Yijing Liu, Jianfei Yuan, Minzhi Yan, Le Yue, Honghui Xiong, Yi Yang

PDF

3 Reviews

TL;DR

This paper introduces G2ConS, a novel method that reduces retrieval-augmented generation costs by selecting important concepts and chunks using a graph-guided approach, improving efficiency and answer quality.

Contribution

G2ConS is the first approach to combine concept selection with an LLM-independent graph to enhance retrieval efficiency in RAG systems.

Findings

01

G2ConS reduces knowledge graph construction costs significantly.

02

G2ConS improves retrieval effectiveness over baselines.

03

G2ConS enhances answer quality in real-world datasets.

Abstract

Graph-based RAG constructs a knowledge graph (KG) from text chunks to enhance retrieval in Large Language Model (LLM)-based question answering. It is especially beneficial in domains such as biomedicine, law, and political science, where effective retrieval often involves multi-hop reasoning over proprietary documents. However, these methods demand numerous LLM calls to extract entities and relations from text chunks, incurring prohibitive costs at scale. Through a carefully designed ablation study, we observe that certain words (termed concepts) and their associated documents are more important. Based on this insight, we propose Graph-Guided Concept Selection (G2ConS). Its core comprises a chunk selection method and an LLM-independent concept graph. The former selects salient document chunks to reduce KG construction costs; the latter closes knowledge gaps introduced by chunk selection…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. Graph-Guided Concept Selection (G2ConS) will ensure an efficient and economical process for constructing a graph. 2. The idea of building a Concept Graph that is independent of LLM and uses its structure for linking text segments is somewhat novel. The conceptualization of the Dual-Path Retrieval mechanism is insightful. 3. Extensive experiments on several QA benchmarks demonstrate the effectiveness of the proposed method.

Weaknesses

1. The overall motivation and implementation of this work are highly similar to KET-RAG, which also employs a KG skeleton concept and a dual-path retrieval mechanism to achieve low-cost graph construction. Therefore, this paper appears to be an incremental extension of KET-RAG. I think its novelty seems insufficient. 2. The main experiments are conducted only on the GPT-4o-mini model, without evaluation on open-source models such as the LLaMA or Qwen series. This raises concerns about whether t

Reviewer 02Rating 2Confidence 5

Strengths

- Authors had a simple but clever idea: not all information is equally important. They ran experiments where they selectively deleted concepts and their related text chunks from the knowledge base. - Considering the concept graph itself, it is somehow a contribution to the situation, but what a pity, authors make it in a wrong way.

Weaknesses

- The premise that some concepts are more important than others is intuitive and not a novel finding. The proposed method only relies on traditional NLP methods into an optimization "plugin" for existing systems. This work feels more like an engineering improvement than a fundamental algorithmic or theoretical contribution that pushes the boundaries of the field. - What a pity that this paper makes it in a wrong way that it combines a core-kg from MS-GraphRAG as another retrieval path separately

Reviewer 03Rating 4Confidence 4

Strengths

- The motivation for reducing reliance on LLM calls in graph construction is strong and addresses a clear inefficiency in existing GraphRAG approaches. - The idea of constructing concept graphs based on word and chunks co-occurrences, rather than using costly LLM-based chunk selection or summarization, is novel and interesting.

Weaknesses

- The generalization of the core component, Core Chunk Selection, is questionable. This component constructs relations between words and text chunks and then filters information simply based on ranking. This may lead to information loss, and there is no quantitative metric for how much information needs to be removed. The process lacks a convincing discussion of its generalizability. - The construction and effectiveness of the concept graph in G2ConS may be sensitive to parameter choices (e.g.,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.