Model-Based Clustering using multi-allelic loci data with loci selection
Wilson Toussile (LM-Orsay), Elisabeth Gassiat (LM-Orsay)

TL;DR
This paper introduces MixMoGenD, a model-based clustering method with loci selection for multi-allelic genetic data, using BIC for model comparison and proven to converge to the true model as sample size increases.
Contribution
It presents a novel approach combining loci selection with clustering, ensuring model convergence and practical implementation for genetic data analysis.
Findings
Loci selection improves clustering accuracy.
The method converges to the true model with large samples.
Numerical experiments validate the approach.
Abstract
We propose a Model-Based Clustering (MBC) method combined with loci selection using multi-allelic loci genetic data. The loci selection problem is regarded as a model selection problem and models in competition are compared with the Bayesian Information Criterion (BIC). The resulting procedure selects the subset of clustering loci, the number of clusters, estimates the proportion of each cluster and the allelic frequencies within each cluster. We prove that the selected model converges in probability to the true model under a single realistic assumption as the size of the sample tends to infinity. The proposed method named MixMoGenD (Mixture Model using Genetic Data) was implemented using c++ programming language. Numerical experiments on simulated data sets was conducted to highlight the interest of the proposed loci selection procedure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock · Gene expression and cancer classification
