Core-based Hierarchies for Efficient GraphRAG
Jakir Hossain, Ahmet Erdem Sar{\i}y\"uce

TL;DR
This paper introduces a deterministic, density-aware hierarchy method using k-core decomposition for GraphRAG, improving global sensemaking in retrieval-augmented generation by enhancing community detection and reducing costs.
Contribution
It replaces Leiden clustering with k-core decomposition for more reproducible, efficient community detection in knowledge graphs, and introduces heuristics for better retrieval and summarization.
Findings
Improves answer comprehensiveness and diversity.
Reduces token usage in LLM-based answers.
Enhances reproducibility of community detection.
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Information Retrieval and Search Behavior
