FASTA/Q Data Compressors for MapReduce-Hadoop Genomics:Space and Time Savings Made Easy -- Version 1
Umberto Ferraro Petrillo, Francesco Palini, Giuseppe Cattaneo,, Raffaele Giancarlo

TL;DR
This paper introduces easy-to-deploy specialized FASTA/Q data compressors within Hadoop and Spark, significantly reducing storage costs and improving processing speed for genomic data in big data environments.
Contribution
It presents two general methods and software for integrating specialized FASTA/Q compressors into Hadoop with minimal effort, achieving substantial space and time savings.
Findings
30% reduction in HDFS data blocks for large genomes
At least 1.5x speed-up in I/O time
Comparable or reduced network communication time
Abstract
Motivation: Storage of genomic data is a major cost for the Life Sciences, effectively addressed mostly via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results: We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy a specialized FASTA/Q compressor within MapReduce-Hadoop for processing files stored on the distributed Hadoop File System, with very little knowledge of Hadoop. Practically, we provide evidence that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Error Correcting Code Techniques
