Efficient Compressed Wavelet Trees over Large Alphabets

Francisco Claude; Gonzalo Navarro; Alberto Ord\'o\~nez

arXiv:1405.1220·cs.DS·May 7, 2014·1 cites

Efficient Compressed Wavelet Trees over Large Alphabets

Francisco Claude, Gonzalo Navarro, Alberto Ord\'o\~nez

PDF

Open Access

TL;DR

This paper introduces the wavelet matrix, a new data structure that improves the efficiency of representing large alphabet sequences by reducing space and time overheads compared to traditional wavelet trees.

Contribution

The paper presents the wavelet matrix, an alternative to wavelet trees that is faster and can be compressed to the sequence's entropy without losing performance.

Findings

01

Wavelet matrix outperforms wavelet tree variants in space/time efficiency.

02

Compression of wavelet matrix to zero-order entropy improves performance.

03

Experimental results confirm the superiority of wavelet matrix over existing methods.

Abstract

The {\em wavelet tree} is a flexible data structure that permits representing sequences $S [1, n]$ of symbols over an alphabet of size $σ$ , within compressed space and supporting a wide range of operations on $S$ . When $σ$ is significant compared to $n$ , current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the {\em wavelet matrix}, an alternative representation for large alphabets that retains all the properties of wavelet trees but is significantly faster. We also show how the wavelet matrix can be compressed up to the zero-order entropy of the sequence without sacrificing, and actually improving, its time performance. Our experimental results show that the wavelet matrix outperforms all the wavelet tree variants along the space/time tradeoff map.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Blind Source Separation Techniques