QVCache: A Query-Aware Vector Cache

An{\i}l Eren G\"o\c{c}er; Ioanna Tsakalidou; Hamish Nicholson; Kyoungmin Kim; Anastasia Ailamaki

arXiv:2602.02057·cs.DB·February 3, 2026

QVCache: A Query-Aware Vector Cache

An{\i}l Eren G\"o\c{c}er, Ioanna Tsakalidou, Hamish Nicholson, Kyoungmin Kim, Anastasia Ailamaki

PDF

Open Access

TL;DR

QVCache introduces a query-aware caching layer for vector databases that significantly reduces latency and maintains high recall, enabling scalable and efficient approximate nearest neighbor search at a small memory cost.

Contribution

It is the first query-level caching system for ANN search that dynamically learns similarity thresholds, operating as a backend-agnostic layer with bounded memory and latency.

Findings

01

Reduces query latency by up to 1000x

02

Maintains high recall comparable to underlying ANN systems

03

Operates with a megabyte-scale memory footprint

Abstract

Vector databases have become a cornerstone of modern information retrieval, powering applications in recommendation, search, and retrieval-augmented generation (RAG) pipelines. However, scaling approximate nearest neighbor (ANN) search to high recall under strict latency SLOs remains fundamentally constrained by memory capacity and I/O bandwidth. Disk-based vector search systems suffer severe latency degradation at high accuracy, while fully in-memory solutions incur prohibitive memory costs at billion-scale. Despite the central role of caching in traditional databases, vector search lacks a general query-level caching layer capable of amortizing repeated query work. We present QVCache, the first backend-agnostic, query-level caching system for ANN search with bounded memory footprint. QVCache exploits semantic query repetition by performing similarity-aware caching rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Caching and Content Delivery · Advanced Image and Video Retrieval Techniques