Nonparametric Estimation of Repeated Densities with Heterogeneous Sample Sizes
Jiaming Qiu, Xiongtao Dai, Zhengyuan Zhu

TL;DR
This paper introduces a nonparametric, data-driven method for estimating densities across multiple subpopulations with varying sample sizes, leveraging exponential families and principal modes of variation.
Contribution
It proposes a novel approach that combines functional data analysis and likelihood-based shrinkage to estimate densities without parametric assumptions, adaptable to heterogeneous sample sizes.
Findings
Method performs well in simulations.
Effective on medical record and rainfall data.
Provides interpretable density estimates.
Abstract
We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation greatly varies. This problem occurs in epidemiology, for example, where different diseases may share similar pathogenic mechanism but differ in their prevalence. Without specifying a parametric form, our proposed method pools information from the population and estimate the density in each subpopulation in a data-driven fashion. Drawing from functional data analysis, low-dimensional approximating density families in the form of exponential families are constructed from the principal modes of variation in the log-densities. Subpopulation densities are subsequently fitted in the approximating families based on likelihood principles and shrinkage. The approximating families increase in their flexibility as the number of components increases and can approximate arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
