FASTR: Reimagining FASTQ via Compact Image-inspired Representation
Adrian Tkachenko, Sepehr Salem, Ayotomiwa Ezekiel Adeniyi, Zulal Bingol, Mohammed Nayeem Uddin, Akshat Prasanna, Alexander Zelikovsky, Serghei Mangul, Can Alkan, Mohammed Alser

TL;DR
FASTR introduces a lossless, compact, and efficient representation for FASTQ files that reduces size, accelerates compression/decompression, and enables direct downstream analysis, facilitating scalable and real-time genomics workflows.
Contribution
FASTR is a novel encoding scheme that combines nucleotides and quality scores into 8-bit values, improving compression, speed, and usability over traditional FASTQ formats.
Findings
Reduces file size by at least 2x compared to FASTQ.
Faster compression and decompression than existing tools.
Enables direct use of reads for downstream analysis without decompression.
Abstract
Motivation: High-throughput sequencing (HTS) enables population-scale genomics but generates massive datasets, creating bottlenecks in storage, transfer, and analysis. FASTQ, the standard format for over two decades, stores one byte per base and one byte per quality score, leading to inefficient I/O, high storage costs, and redundancy. Existing compression tools can mitigate some issues, but often introduce costly decompression or complex dependency issues. Results: We introduce FASTR, a lossless, computation-native successor to FASTQ that encodes each nucleotide together with its base quality score into a single 8-bit value. FASTR reduces file size by at least 2x while remaining fully reversible and directly usable for downstream analyses. Applying general-purpose compression tools on FASTR consistently yields higher compression ratios, 2.47, 3.64, and 4.8x faster compression, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Advanced Data Compression Techniques
