From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base
Yuzhou Zhu

TL;DR
This paper introduces Streaming RAG, a real-time knowledge base system that efficiently updates and retrieves information from streaming data sources, significantly improving speed and accuracy over static methods.
Contribution
The paper presents a novel Streaming RAG pipeline combining multi-vector screening, clustering, and filtering, with theoretical guarantees and practical efficiency for real-time knowledge retrieval.
Findings
Up to 3-point increase in Recall@10
Latency below 15 ms per query
Throughput exceeding 900 documents/sec
Abstract
Dynamic streams from news feeds, social media, sensor networks, and financial markets challenge static RAG frameworks. Full-scale indices incur high memory costs; periodic rebuilds introduce latency that undermines data freshness; naive sampling sacrifices semantic coverage. We present Streaming RAG, a unified pipeline that combines multi-vector cosine screening, mini-batch clustering, and a counter-based heavy-hitter filter to maintain a compact prototype set. We further prove an approximation bound $E\[R(K\_t)] \ge R^\* - L \Delta$ linking retrieval quality to clustering variance. An incremental index upsert mechanism refreshes prototypes without interrupting queries. Experiments on eight real-time streams show statistically significant gains in Recall\@10 (up to 3 points, p < 0.01), end-to-end latency below 15 ms, and throughput above 900 documents per second under a 150 MB budget.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
