Estimating Unknown Population Sizes Using the Hypergeometric Distribution
Liam Hodgson, Danilo Bzdok

TL;DR
This paper introduces a novel method for estimating unknown population sizes and distributions using the hypergeometric likelihood, effective even with limited sampling, and demonstrates its versatility across NLP and genomics applications.
Contribution
The paper proposes a new hypergeometric likelihood-based approach for estimating unknown population sizes and distributions, including in complex, under-sampled scenarios, with a variational autoencoder framework.
Findings
Outperforms existing methods in population size estimation accuracy.
Effectively learns informative latent spaces from sparse data.
Successfully applied to NLP and genomics for real-world problems.
Abstract
The multivariate hypergeometric distribution describes sampling without replacement from a discrete population of elements divided into multiple categories. Addressing a gap in the literature, we tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown. Here, we propose a novel solution using the hypergeometric likelihood to solve this estimation challenge, even in the presence of severe under-sampling. We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable, such as with collaborative filtering, using the variational autoencoder framework. Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data, both in terms of accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Wildlife Ecology and Conservation · Animal Diversity and Health Studies
