RStore: A Distributed Multi-version Document Store
Souvik Bhattacherjee, Amol Deshpande

TL;DR
RStore is a distributed multi-version document storage system that efficiently manages large collections of document versions, optimizing storage and retrieval through novel layouts and algorithms, and demonstrating superior performance over traditional methods.
Contribution
The paper introduces RStore, a novel distributed system architecture with new storage layout algorithms and online version handling, improving efficiency and scalability for multi-version document storage.
Findings
Operates efficiently at large scale in practical scenarios.
Outperforms standard delta-based storage engines by orders of magnitude.
Provides flexible tuning for different data and query workloads.
Abstract
We address the problem of compactly storing a large number of versions (snapshots) of a collection of keyed documents or records in a distributed environment, while efficiently answering a variety of retrieval queries over those, including retrieving full or partial versions, and evolution histories for specific keys. We motivate the increasing need for such a system in a variety of application domains, carefully explore the design space for building such a system and the various storage-computation-retrieval trade-offs, and discuss how different storage layouts influence those trade-offs. We propose a novel system architecture that satisfies the key desiderata for such a system, and offers simple tuning knobs that allow adapting to a specific data and query workload. Our system is intended to act as a layer on top of a distributed key-value store that houses the raw data as well as any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
