Modeling and estimating skewed and heavy-tailed populations via unsupervised mixture models

Marco Bee; Flavio Santi

arXiv:2505.22507·stat.ME·May 29, 2025

Modeling and estimating skewed and heavy-tailed populations via unsupervised mixture models

Marco Bee, Flavio Santi

PDF

Open Access

TL;DR

This paper introduces an unsupervised mixture model combining lognormal and Pareto components to effectively model skewed, heavy-tailed data common in actuarial and risk management, with efficient estimation via EM algorithm.

Contribution

It presents a novel mixture model tailored for skewed, heavy-tailed data and demonstrates its ease of estimation and flexibility over existing models.

Findings

01

Model fits heavy-tailed data well

02

EM algorithm enables efficient maximum likelihood estimation

03

Outperforms similar existing distributions in real-data application

Abstract

We develop an unsupervised mixture model for non-negative, skewed and heavy-tailed data, such as losses in actuarial and risk management applications. The mixture has a lognormal component, which is usually appropriate for the body of the distribution, and a Pareto-type tail, aimed at accommodating the largest observations, since the lognormal tail often decays too fast. We show that maximum likelihood estimation can be performed by means of the EM algorithm and that the model is quite flexible in fitting data from different data-generating processes. Simulation experiments and a real-data application to automobiles claims suggest that the approach is equivalent in terms of goodness-of-fit, but easier to estimate, with respect to two existing distributions with similar features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications · Census and Population Estimation