Re-Pair Compression of Inverted Lists
Francisco Claude, Antonio Farina, Gonzalo Navarro

TL;DR
This paper explores using Re-Pair compression for inverted lists, offering a novel approach that balances speed and space, but still needs improvements to surpass current methods.
Contribution
It introduces Re-Pair based compression variants for inverted lists and evaluates their performance, providing a new direction for efficient list intersection.
Findings
Re-Pair variants offer a promising time/space tradeoff
Current methods still outperform Re-Pair variants in some aspects
Further improvements are needed for Re-Pair to surpass state-of-the-art techniques
Abstract
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers. In this paper we explore a completely different alternative: We use Re-Pair compression of those differences. While Re-Pair by itself offers fast decompression at arbitrary positions in main and secondary memory, we introduce variants that in addition speed up the operations required for inverted list intersection. We compare the resulting data structures with several recent proposals under various list intersection algorithms, to conclude that our Re-Pair variants offer an interesting time/space tradeoff for this problem, yet further improvements are required for it to improve upon the state of the art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · DNA and Biological Computing
