Modeling and estimating skewed and heavy-tailed populations via unsupervised mixture models
Marco Bee, Flavio Santi

TL;DR
This paper introduces an unsupervised mixture model combining lognormal and Pareto components to effectively model skewed, heavy-tailed data common in actuarial and risk management, with efficient estimation via EM algorithm.
Contribution
It presents a novel mixture model tailored for skewed, heavy-tailed data and demonstrates its ease of estimation and flexibility over existing models.
Findings
Model fits heavy-tailed data well
EM algorithm enables efficient maximum likelihood estimation
Outperforms similar existing distributions in real-data application
Abstract
We develop an unsupervised mixture model for non-negative, skewed and heavy-tailed data, such as losses in actuarial and risk management applications. The mixture has a lognormal component, which is usually appropriate for the body of the distribution, and a Pareto-type tail, aimed at accommodating the largest observations, since the lognormal tail often decays too fast. We show that maximum likelihood estimation can be performed by means of the EM algorithm and that the model is quite flexible in fitting data from different data-generating processes. Simulation experiments and a real-data application to automobiles claims suggest that the approach is equivalent in terms of goodness-of-fit, but easier to estimate, with respect to two existing distributions with similar features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications · Census and Population Estimation
