Dy-mer: An Explainable DNA Sequence Representation Scheme using Dictionary Learning
Zhiyuan Peng, Naifan Zhang, Yuanbo Tang, Yang Li

TL;DR
Dy-mer is an interpretable DNA sequence representation method based on dictionary learning that improves predictive performance and biological interpretability in genomic tasks.
Contribution
The paper introduces Dy-mer, a novel dictionary learning-based DNA representation scheme that captures local structural features and enhances interpretability and robustness.
Findings
Achieves state-of-the-art results in promoter classification and motif detection.
Learned dymers correspond to known DNA motifs.
Enables meaningful phylogenetic clustering.
Abstract
DNA sequences encode critical genetic information, yet their variable length and discrete nature impede direct utilization in deep learning models. Existing DNA representation schemes convert sequences into numerical vectors but fail to capture structural features of local subsequences and often suffer from limited interpretability and poor generalization on small datasets. To address these limitations, we propose Dy-mer, an interpretable and robust DNA representation scheme based on dictionary learning. Dy-mer formulates an optimization problem in tensor format, which ensures computational efficiency in batch processing. Our scheme reconstructs DNA sequences as concatenations of dynamic-length subsequences (dymers) through a convolution operation and simultaneously optimize a learnable dymer dictionary and sparse representations. Our method achieves state-of-the-art performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Fractal and DNA sequence analysis · Genomics and Phylogenetic Studies
