On the informativeness of dominant and co-dominant genetic markers for   Bayesian supervised clustering

Gilles Guillot; Alexandra Carpentier-Skandalis

arXiv:1112.2868·q-bio.PE·December 14, 2011

On the informativeness of dominant and co-dominant genetic markers for Bayesian supervised clustering

Gilles Guillot, Alexandra Carpentier-Skandalis

PDF

TL;DR

This paper derives an exact formula linking clustering error to the number of genetic markers and clusters, showing dominant markers can match codominant markers' accuracy with more loci.

Contribution

It provides a novel exact formula relating clustering error, number of loci, and clusters, applicable to both dominant and codominant markers.

Findings

01

Dominant markers require approximately 1.7 times more loci to match codominant markers' accuracy.

02

The formula is exact and valid for any number of clusters and markers.

03

Study informs marker choice in genetic clustering analyses.

Abstract

We study the accuracy of Bayesian supervised method used to cluster individuals into genetically homogeneous groups on the basis of dominant or codominant molecular markers. We provide a formula relating an error criterion the number of loci used and the number of clusters. This formula is exact and holds for arbitrary number of clusters and markers. Our work suggests that dominant markers studies can achieve an accuracy similar to that of codominant markers studies if the number of markers used in the former is about 1.7 times larger than in the latter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.