Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques
Mehedi Hasan Sarkar, Adnan Ferdous Ashrafi

TL;DR
This paper introduces a novel DNA compression method using a modified deep learning architecture and a double-base strategy, demonstrating improved performance over existing methods on mitochondrial genome datasets.
Contribution
The paper presents a new architecture called modified DeepDNA with a double-base strategy for more effective DNA sequence compression, extending previous research.
Findings
Outperforms existing methods like DeepDNA on mitochondrial genome data
Effective on datasets of sizes 100, 243, and 356
Shows significant improvement in compression efficiency
Abstract
We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs associated with data storage and transmission. The question of how to properly compress data from genomic sequences is still open. Previously many researcher proposed many compression method on this topic DNA Compression without machine learning and with machine learning approach. Extending a previous research, we propose a new architecture like modified DeepDNA and we have propose a new methodology be deploying a double base-ed strategy for compression of DNA sequences. And validated the results by experimenting on three sizes of datasets are 100, 243, 356. The experimental outcomes highlight our improved approach's superiority over existing approaches for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Fractal and DNA sequence analysis · DNA and Biological Computing
