Write-Read Decoupling in Modern Large-Scale Search Engines: Architectures, Techniques, and Emerging Approaches
Xin Liang, Qing Yang, Wenru Qiu, Wenjie Mao, Tianyu Ma, Minghui Zhu, Nan Wang

TL;DR
This survey analyzes architectural solutions for decoupling write and read operations in large-scale search engines to reduce latency and resource contention, highlighting emerging hybrid approaches like ScaleSearch.
Contribution
It systematically reviews existing architectures, introduces the ScaleSearch synthesis, and discusses open challenges in hybrid retrieval, AI integration, and serverless deployment.
Findings
Identifies five principal patterns for write-read decoupling.
Highlights the ScaleSearch architecture combining multiple techniques.
Discusses open challenges in hybrid retrieval and AI integration.
Abstract
Large-scale search engines face a fundamental tension: the index must be updated frequently to maintain freshness, yet updates create resource contention that inflates query latency. In the dominant Lucene-based architecture, segment merges triggered by writes compete with concurrent queries for CPU cycles, disk I/O bandwidth, and operating-system page cache -- a problem we term \emph{write-read contention}. This survey systematically examines the architectural solutions that industry and academia have developed to decouple write pressure from read latency. We identify five principal patterns: (i)~node-level read-write separation; (ii)~compute-storage separation; (iii)~full in-memory indexing; (iv)~log-structured write paths; and (v)~in-place partial updates. We survey representative systems including Elasticsearch, LinkedIn Galene, Uber Sia, Quickwit, Alibaba Havenask, Algolia, Milvus,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
