Limit Theorems for Empirical R\'enyi Entropy and Divergence with Applications to Molecular Diversity Analysis
Maciej Pietrzak, Grzegorz A. Rempa{\l}a, Micha{\l} Seweryn, Jacek, Weso{\l}owski

TL;DR
This paper develops limit theorems for empirical Renyi entropy and divergence in large, sparse, and unbalanced datasets, enabling more accurate molecular biodiversity analysis from high-throughput sequencing data.
Contribution
It introduces a frequency-based framework with Gaussian limit results for Renyi entropy and divergence applicable to large, sparse, and heavy-tailed genomic data.
Findings
Validates methods with RNA sequencing examples
Addresses challenges of large, unbalanced contingency tables
Provides theoretical foundation for molecular diversity metrics
Abstract
Quantitative methods for studying biodiversity have been traditionally rooted in the classical theory of finite frequency tables analysis. However, with the help of modern experimental tools, like high throughput sequencing, we now begin to unlock the outstanding diversity of genomic data in plants and animals reflective of the long evolutionary history of our planet. This molecular data often defies the classical frequency/contingency tables assumptions and seems to require sparse tables with very large number of categories and highly unbalanced cell counts, e.g., following heavy tailed distributions (for instance, power laws). Motivated by the molecular diversity studies, we propose here a frequency-based framework for biodiversity analysis in the asymptotic regime where the number of categories grows with sample size (an infinite contingency table). Our approach is rooted in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Genetic Dynamics · Bayesian Methods and Mixture Models · Fractal and DNA sequence analysis
