Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse   GraphRAG

Nicholas Alonso; Beren Millidge

arXiv:2412.06078·cs.IR·December 10, 2024

Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG

Nicholas Alonso, Beren Millidge

PDF

Open Access

TL;DR

This paper introduces MixPR, a sparse, PageRank-based retrieval algorithm that efficiently handles long-context tasks, outperforming existing methods while significantly reducing compute costs, and enabling real-time processing on CPU.

Contribution

The paper presents MixPR, a novel, sparse, PageRank-based retrieval method that improves efficiency and performance for long-context tasks compared to prior RAG approaches.

Findings

01

MixPR achieves state-of-the-art results on long-context benchmarks.

02

MixPR is highly compute-efficient, capable of embedding millions of tokens in seconds.

03

MixPR runs entirely on CPU, enabling practical real-time applications.

Abstract

Recent advances have extended the context window of frontier LLMs dramatically, from a few thousand tokens up to millions, enabling entire books and codebases to fit into context. However, the compute costs of inferencing long-context LLMs are massive and often prohibitive in practice. RAG offers an efficient and effective alternative: retrieve and process only the subset of the context most important for the current task. Although promising, recent work applying RAG to long-context tasks has two core limitations: 1) there has been little focus on making the RAG pipeline compute efficient, and 2) such works only test on simple QA tasks, and their performance on more challenging tasks is unclear. To address this, we develop an algorithm based on PageRank, a graph-based retrieval algorithm, which we call mixture-of-PageRanks (MixPR). MixPR uses a mixture of PageRank-based graph-retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Data Management and Algorithms · Web Data Mining and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Byte Pair Encoding · Residual Connection · Multi-Head Attention · Weight Decay · WordPiece · Softmax