CAKL: Commutative algebra k-mer learning of genomics
Faisal Suwayyid, Yuta Hozumi, Hongsong Feng, Mushal Zia, JunJie Wee, and Guo-Wei Wei

TL;DR
CAKL introduces a novel nonlinear algebraic framework based on commutative algebra for analyzing genomic sequences, demonstrating superior performance and scalability across various genomic tasks compared to existing methods.
Contribution
This work pioneers the application of commutative algebra in genomics, establishing a new mathematical paradigm for comparative genomic analysis.
Findings
Outperforms five state-of-the-art methods across eleven datasets.
Maintains stable accuracy as dataset size increases.
Particularly effective in viral genome classification.
Abstract
Despite the availability of various sequence analysis models, comparative genomic analysis remains a challenge in genomics, genetics, and phylogenetics. Commutative algebra, a fundamental tool in algebraic geometry and number theory, has rarely been used in data and biological sciences. In this study, we introduce commutative algebra k-mer learning (CAKL) as the first-ever nonlinear algebraic framework for analyzing genomic sequences. CAKL bridges between commutative algebra, algebraic topology, combinatorics, and machine learning to establish a new mathematical paradigm for comparative genomic analysis. We evaluate its effectiveness on three tasks -- genetic variant identification, phylogenetic tree analysis, and viral genome classification -- typically requiring alignment-based, alignment-free, and machine-learning approaches, respectively. Across eleven datasets, CAKL outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
