Sufficient digits and density estimation: A Bayesian nonparametric approach using generalized finite P\'olya trees
Mario Beraha, Jesper M{\o}ller

TL;DR
This paper introduces a Bayesian nonparametric method using generalized finite Pólya trees for density estimation based on digit representation, allowing flexible modeling and learning of the digit system from data.
Contribution
It develops a novel Bayesian approach with closed-form posterior analysis for density estimation using digit-based Pólya trees, including extensions for multiple scales and Newcomb-Benford law.
Findings
Effective density estimation on synthetic data
Promising results on human-activity datasets
Consistent posterior distributions with increasing sample size
Abstract
This paper proposes a novel approach for statistical modelling of a continuous random variable on , based on its digit representation . In general, can be coupled with a latent random variable so that becomes a sufficient statistics and is uniformly distributed. In line with this fact, and focusing on binary digits for simplicity, we propose a family of generalized finite P{\'o}lya trees that induces a random density for a sample, which becomes a flexible tool for density estimation. Here, the digit system may be random and learned from the data. We provide a detailed Bayesian analysis, including closed form expression for the posterior distribution. We analyse the frequentist properties as the sample size increases, and provide sufficient conditions for consistency of the posterior distributions of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBenford’s Law and Fraud Detection · Statistical Mechanics and Entropy · Computability, Logic, AI Algorithms
