Combinatorics of a dissimilarity measure for pairs of draws from discrete probability vectors on finite sets of objects
Zarif Ahsan, Xiran Liu, Noah A. Rosenberg

TL;DR
This paper explores the combinatorial structure and expected dissimilarity of pairs of random samples from discrete probability distributions, with applications in population genetics of polyploid organisms.
Contribution
It introduces a novel combinatorial framework for analyzing dissimilarity measures between pairs of draws from discrete distributions, extending previous results for larger sample sizes.
Findings
Derived a simple formula for the expected dissimilarity measure.
Identified conditions when dissimilarity within the same distribution exceeds that between different distributions.
Applied the results to genetic analysis of polyploid organisms.
Abstract
Motivated by a problem in population genetics, we examine the combinatorics of dissimilarity for pairs of random unordered draws of multiple objects, with replacement, from a collection of distinct objects. Consider two draws of size taken with replacement from a set of objects, where the two draws represent samples from potentially distinct probability distributions over the set of objects. We define the set of \emph{identity states} for pairs of draws via a series of actions by permutation groups, describing the enumeration of all such states for a given and . Given two probability vectors for the objects, we compute the probability of each identity state. From the set of all such probabilities, we obtain the expectation for a dissimilarity measure, finding that it has a simple form that generalizes a result previously obtained for the case of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Geometry and Mesh Generation · semigroups and automata theory · Algorithms and Data Compression
