LiveVectorLake: A Real-Time Versioned Knowledge Base Architecture for Streaming Vector Updates and Temporal Retrieval

Tarun Prajapati

arXiv:2601.05270·cs.IR·January 12, 2026

LiveVectorLake: A Real-Time Versioned Knowledge Base Architecture for Streaming Vector Updates and Temporal Retrieval

Tarun Prajapati

PDF

Open Access

TL;DR

LiveVectorLake is a dual-tier architecture that enables real-time semantic search and complete version history for knowledge bases, balancing query speed, update efficiency, and compliance needs.

Contribution

It introduces a novel architecture combining content-addressable synchronization, dual-tier storage, and temporal query routing for efficient, versioned knowledge retrieval.

Findings

01

10-15% re-processing during updates

02

sub-100ms retrieval latency for current knowledge

03

sub-2s latency for temporal queries

Abstract

Modern Retrieval-Augmented Generation (RAG) systems struggle with a fundamental architectural tension: vector indices are optimized for query latency but poorly handle continuous knowledge updates, while data lakes excel at versioning but introduce query latency penalties. We introduce LiveVectorLake, a dual-tier temporal knowledge base architecture that enables real-time semantic search on current knowledge while maintaining complete version history for compliance, auditability, and point-in-time retrieval. The system introduces three core architectural contributions: (1) Content-addressable chunk-level synchronization using SHA-256 hashing for deterministic change detection without external state tracking; (2) Dual-tier storage separating hot-tier vector indices (Milvus with HNSW) from cold-tier columnar versioning (Delta Lake with Parquet), optimizing query latency and storage cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Information Retrieval and Search Behavior · Semantic Web and Ontologies