OnPair: Short Strings Compression for Fast Random Access
Francesco Gargiulo, Rossano Venturini

TL;DR
OnPair is a dictionary-based compression algorithm optimized for in-memory databases, offering a balance of high compression ratios and fast random access through cache-friendly, incremental dictionary construction.
Contribution
It introduces a novel, efficient dictionary construction method that enables fast training and supports fine-grained random access, bridging the gap between existing high-ratio and fast compression methods.
Findings
Achieves compression ratios comparable to BPE.
Significantly faster compression speed.
Reduces memory usage during compression.
Abstract
We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at significant computational and memory cost (e.g., BPE) or prioritize speed at the expense of compression quality (e.g., FSST). OnPair bridges this gap by employing a cache-friendly dictionary construction technique that incrementally merges frequent adjacent substrings in a single sequential pass over a data sample. This enables fast, memory-efficient training without tracking global pair positions, as required by traditional BPE. We also introduce OnPair16, a variant that limits dictionary entries to 16 bytes, enabling faster parsing via optimized longest prefix matching. Both variants compress strings independently, supporting fine-grained random access…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Data Management and Algorithms
