Deep Unsupervised Identification of Selected SNPs between Adapted Populations on Pool-seq Data
Julia Siekiera, Stefan Kramer

TL;DR
This paper introduces an unsupervised deep learning pipeline using CNNs and explainable AI to identify selected SNPs in Pool-seq data, overcoming errors and limitations of traditional statistical methods.
Contribution
The authors develop a novel unsupervised CNN-based approach for SNP detection that does not require known ground truth, leveraging explainability to identify genetic variants.
Findings
Outperforms traditional statistical methods in SNP detection.
Effectively identifies regions with high genetic differentiation.
Extends the capabilities of existing Pool-seq analysis techniques.
Abstract
The exploration of selected single nucleotide polymorphisms (SNPs) to identify genetic diversity between different sequencing population pools (Pool-seq) is a fundamental task in genetic research. As underlying sequence reads and their alignment are error-prone and univariate statistical solutions only take individual positions of the genome into account, the identification of selected SNPs remains a challenging process. Deep learning models like convolutional neural networks (CNNs) are able to consider large input areas in their decisions. We suggest an unsupervised pipeline to be independent of a rarely known ground truth. We train a supervised discriminator CNN to distinguish alignments from different populations and utilize the model for unsupervised SNP calling by applying explainable artificial intelligence methods. Our proposed multivariate method is based on two main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Gene expression and cancer classification · Molecular Biology Techniques and Applications
