TL;DR
This paper introduces convolutional embedded networks that improve population clustering and bio-ancestry inference from genetic variants, outperforming existing methods in accuracy and scalability on large genomic datasets.
Contribution
The paper presents a novel combination of convolutional embedded clustering and autoencoder classifiers for scalable, accurate population and ethnicity inference from genomic data.
Findings
Outperforms state-of-the-art methods like VariantSpark and ADMIXTURE.
Achieves high clustering accuracy with an adjusted rand index of 0.915.
Predicts geographic ethnicity with an F1 score of 0.9004.
Abstract
The study of genetic variants can help find correlating population groups to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. Machine learning algorithms are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. Since the performance of a learning algorithm not only depends on the size and nature of the data but also on the quality of underlying representation, deep neural networks can learn non-linear mappings that allow transforming GVs data into more clustering and classification friendly representations than manual feature selection. In this paper, we proposed convolutional embedded networks in which we combine two DNN architectures called convolutional embedded clustering and convolutional autoencoder classifier for clustering individuals and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsShapley Additive Explanations · Solana Customer Service Number +1-833-534-1729
