FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems

Tianqi Zhang; Flavio Ponzina; Tajana Rosing

arXiv:2601.09985·cs.LG·January 16, 2026

FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems

Tianqi Zhang, Flavio Ponzina, Tajana Rosing

PDF

Open Access

TL;DR

FaTRQ is a novel system that enhances large-scale vector search efficiency by eliminating costly full-vector fetches through tiered residual quantization and a custom accelerator, significantly reducing latency and storage needs.

Contribution

FaTRQ introduces tiered residual quantization and a progressive distance estimator to enable far-memory-aware refinement without full vector fetches in ANNS systems.

Findings

01

Storage efficiency improved by 2.4×

02

Throughput increased by up to 9×

03

Eliminates second-pass refinement latency

Abstract

Approximate Nearest-Neighbor Search (ANNS) is a key technique in retrieval-augmented generation (RAG), enabling rapid identification of the most relevant high-dimensional embeddings from massive vector databases. Modern ANNS engines accelerate this process using prebuilt indexes and store compressed vector-quantized representations in fast memory. However, they still rely on a costly second-pass refinement stage that reads full-precision vectors from slower storage like SSDs. For modern text and multimodal embeddings, these reads now dominate the latency of the entire query. We propose FaTRQ, a far-memory-aware refinement system using tiered memory that eliminates the need to fetch full vectors from storage. It introduces a progressive distance estimator that refines coarse scores using compact residuals streamed from far memory. Refinement stops early once a candidate is provably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Data Management and Algorithms · Graph Theory and Algorithms