Genetic Sequence compression using Machine Learning and Arithmetic   Encoding Decoding Techniques

Mehedi Hasan Sarkar; Adnan Ferdous Ashrafi

arXiv:2212.02864·q-bio.QM·March 10, 2023

Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques

Mehedi Hasan Sarkar, Adnan Ferdous Ashrafi

PDF

Open Access

TL;DR

This paper introduces a novel DNA compression method using a modified deep learning architecture and a double-base strategy, demonstrating improved performance over existing methods on mitochondrial genome datasets.

Contribution

The paper presents a new architecture called modified DeepDNA with a double-base strategy for more effective DNA sequence compression, extending previous research.

Findings

01

Outperforms existing methods like DeepDNA on mitochondrial genome data

02

Effective on datasets of sizes 100, 243, and 356

03

Shows significant improvement in compression efficiency

Abstract

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs associated with data storage and transmission. The question of how to properly compress data from genomic sequences is still open. Previously many researcher proposed many compression method on this topic DNA Compression without machine learning and with machine learning approach. Extending a previous research, we propose a new architecture like modified DeepDNA and we have propose a new methodology be deploying a double base-ed strategy for compression of DNA sequences. And validated the results by experimenting on three sizes of datasets are 100, 243, 356. The experimental outcomes highlight our improved approach's superiority over existing approaches for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Fractal and DNA sequence analysis · DNA and Biological Computing