Tidehunter: Large-Value Storage With Minimal Data Relocation
Andrey Chursin, Lefteris Kokoris-Kogias, Alex Orlov, Alberto Sonnino, Igor Zablotchi

TL;DR
Tidehunter is a storage engine that eliminates value compaction in LSM-trees, enabling high-throughput, low-latency key-value storage for large values with minimal data movement, suitable for blockchain and content-addressable storage.
Contribution
It introduces a novel approach that treats the WAL as permanent storage, avoiding value overwrites and reducing write amplification, with lock-free writes and epoch-based pruning.
Findings
Achieves 830K writes/sec on 1TB dataset, outperforming RocksDB and BlobDB.
Improves query performance by up to 15.6x.
Successfully integrated into Sui blockchain, maintaining stable throughput and latency.
Abstract
Log-Structured Merge-Trees (LSM-trees) dominate persistent key-value storage but suffer from high write amplification from 10x to 30x under random workloads due to repeated compaction. This overhead becomes prohibitive for large values with uniformly distributed keys, a workload common in content-addressable storage, deduplication systems, and blockchain validators. We present Tidehunter, a storage engine that eliminates value compaction by treating the Write-Ahead Log (WAL) as permanent storage rather than a temporary recovery buffer. Values are never overwritten; and small, lazily-flushed index tables map keys to WAL positions. Tidehunter introduces (a) lock-free writes that saturate NVMe drives through atomic allocation and parallel copying, (b) an optimistic index structure that exploits uniform key distributions for single-roundtrip lookups, and (c) epoch-based pruning that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Data Security Solutions · Cloud Computing and Resource Management
