Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
Nicholas Alonso, Beren Millidge

TL;DR
This paper introduces MixPR, a sparse, PageRank-based retrieval algorithm that efficiently handles long-context tasks, outperforming existing methods while significantly reducing compute costs, and enabling real-time processing on CPU.
Contribution
The paper presents MixPR, a novel, sparse, PageRank-based retrieval method that improves efficiency and performance for long-context tasks compared to prior RAG approaches.
Findings
MixPR achieves state-of-the-art results on long-context benchmarks.
MixPR is highly compute-efficient, capable of embedding millions of tokens in seconds.
MixPR runs entirely on CPU, enabling practical real-time applications.
Abstract
Recent advances have extended the context window of frontier LLMs dramatically, from a few thousand tokens up to millions, enabling entire books and codebases to fit into context. However, the compute costs of inferencing long-context LLMs are massive and often prohibitive in practice. RAG offers an efficient and effective alternative: retrieve and process only the subset of the context most important for the current task. Although promising, recent work applying RAG to long-context tasks has two core limitations: 1) there has been little focus on making the RAG pipeline compute efficient, and 2) such works only test on simple QA tasks, and their performance on more challenging tasks is unclear. To address this, we develop an algorithm based on PageRank, a graph-based retrieval algorithm, which we call mixture-of-PageRanks (MixPR). MixPR uses a mixture of PageRank-based graph-retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Data Management and Algorithms · Web Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Byte Pair Encoding · Residual Connection · Multi-Head Attention · Weight Decay · WordPiece · Softmax
