On the informativeness of dominant and co-dominant genetic markers for Bayesian supervised clustering
Gilles Guillot, Alexandra Carpentier-Skandalis

TL;DR
This paper derives an exact formula linking clustering error to the number of genetic markers and clusters, showing dominant markers can match codominant markers' accuracy with more loci.
Contribution
It provides a novel exact formula relating clustering error, number of loci, and clusters, applicable to both dominant and codominant markers.
Findings
Dominant markers require approximately 1.7 times more loci to match codominant markers' accuracy.
The formula is exact and valid for any number of clusters and markers.
Study informs marker choice in genetic clustering analyses.
Abstract
We study the accuracy of Bayesian supervised method used to cluster individuals into genetically homogeneous groups on the basis of dominant or codominant molecular markers. We provide a formula relating an error criterion the number of loci used and the number of clusters. This formula is exact and holds for arbitrary number of clusters and markers. Our work suggests that dominant markers studies can achieve an accuracy similar to that of codominant markers studies if the number of markers used in the former is about 1.7 times larger than in the latter.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
