Pareto Optimal Compression of Genomic Dictionaries, with or without Random Access in Main Memory
Raffaele Giancarlo, Gennaro Grimaudo

TL;DR
This paper systematically compares various compression methods for genomic dictionaries, providing a foundation for Pareto optimality analysis and offering a software tool to explore optimal solutions for storing k-mer sets efficiently.
Contribution
It introduces a comprehensive experimental framework and software for evaluating and identifying Pareto optimal compression solutions for genomic dictionaries.
Findings
Highlights the trade-offs between compression ratio and decompression efficiency.
Provides a set of Pareto optimal solutions for different use cases.
Offers a software tool for exploring compression options.
Abstract
Motivation: A Genomic Dictionary, i.e., the set of the k-mers appearing in a genome, is a fundamental source of genomic information: its collection is the first step in strategic computational methods ranging from assembly to sequence comparison and phylogeny. Unfortunately, it is costly to store. This motivates some recent studies regarding the compression of those k-mer sets. However, such an area does not have the maturity of genomic compression, lacking an homogeneous and methodologically sound experimental foundation that allows to fairly compare the relative merits of the available solutions, and that takes into account also the rich choices of compression methods that can be used. Results: We provide such a foundation here, supporting it with an extensive set of experiments that use reference datasets and a carefully selected set of representative data compressors. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Evolutionary Algorithms and Applications
