DNA Sequence Classification with Compressors

\c{S}\"ukr\"u Ozan

arXiv:2401.14025·q-bio.GN·January 26, 2024·1 cites

DNA Sequence Classification with Compressors

\c{S}\"ukr\"u Ozan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a resource-efficient, compressor-based DNA sequence classification method that maintains high accuracy while reducing computational demands, advancing scalable genomic analysis.

Contribution

It adapts a parameter-free compression approach for DNA classification using multiple algorithms, improving efficiency over traditional machine learning methods.

Findings

01

Effective classification across multiple species

02

Comparable accuracy to state-of-the-art methods

03

Enhanced resource efficiency and scalability

Abstract

Recent studies in DNA sequence classification have leveraged sophisticated machine learning techniques, achieving notable accuracy in categorizing complex genomic data. Among these, methods such as k-mer counting have proven effective in distinguishing sequences from varied species like chimpanzees, dogs, and humans, becoming a staple in contemporary genomic research. However, these approaches often demand extensive computational resources, posing a challenge in terms of scalability and efficiency. Addressing this issue, our study introduces a novel adaptation of Jiang et al.'s compressor-based, parameter-free classification method, specifically tailored for DNA sequence analysis. This innovative approach utilizes a variety of compression algorithms, such as Gzip, Brotli, and LZMA, to efficiently process and classify genomic sequences. Not only does this method align with the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sukruozan/dna-sequence-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Algorithms and Data Compression

MethodsALIGN