PMKLC: Parallel Multi-Knowledge Learning-based Lossless Compression for Large-Scale Genomics Database
Hui Sun, Yanfeng Ding, Liping Yi, Huidong Ma, Gang Wang, Xiaoguang Liu, Cheng Zhong, Wentong Cai

TL;DR
PMKLC introduces a parallel, multi-knowledge learning-based lossless compressor optimized for large-scale genomic data, significantly improving compression ratio, throughput, and robustness over existing methods.
Contribution
The paper presents a novel GPU-accelerated, multi-knowledge learning framework with parallel mechanisms and adaptable modes for efficient, robust lossless genomic data compression.
Findings
Achieves up to 73.6% better compression ratio.
Up to 10.7 times higher throughput.
Demonstrates superior robustness and resource efficiency.
Abstract
Learning-based lossless compressors play a crucial role in large-scale genomic database backup, storage, transmission, and management. However, their 1) inadequate compression ratio, 2) low compression \& decompression throughput, and 3) poor compression robustness limit their widespread adoption and application in both industry and academia. To solve those challenges, we propose a novel \underline{P}arallel \underline{M}ulti-\underline{K}nowledge \underline{L}earning-based \underline{C}ompressor (PMKLC) with four crucial designs: 1) We propose an automated multi-knowledge learning-based compression framework as compressors' backbone to enhance compression ratio and robustness; 2) we design a GPU-accelerated (,)-mer encoder to optimize compression throughput and computing resource usage; 3) we introduce data block partitioning and Step-wise Model Passing (SMP) mechanisms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Algorithms and Data Compression
