On minimizers and convolutional filters: theoretical connections and applications to genome analysis
Yun William Yu

TL;DR
This paper reveals a mathematical connection between minimizers and CNN filters in biological sequence analysis, showing that random CNN initialization mimics minimizer selection and demonstrating practical implications for genome analysis.
Contribution
It provides a theoretical link between minimizers and CNN filters, and empirically explores their effects on sequence analysis and genome data.
Findings
Random CNN filters with max-pooling emulate minimizer selection.
Decreased density in repetitive regions observed in experiments.
CNN embedding of SARS-CoV-2 reads captures sequence distances.
Abstract
Minimizers and convolutional neural networks (CNNs) are two quite distinct popular techniques that have both been employed to analyze categorical biological sequences. At face value, the methods seem entirely dissimilar. Minimizers use min-wise hashing on a rolling window to extract a single important k-mer feature per window. CNNs start with a wide array of randomly initialized convolutional filters, paired with a pooling operation, and then multiple additional neural layers to learn both the filters themselves and how they can be used to classify the sequence. Here, our main result is a careful mathematical analysis of hash function properties showing that for sequences over a categorical alphabet, random Gaussian initialization of convolutional filters with max-pooling is equivalent to choosing a minimizer ordering such that selected k-mers are (in Hamming distance) far from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Fractal and DNA sequence analysis · Genomics and Phylogenetic Studies
