Synthesizing Property & Casualty Ratemaking Datasets using Generative Adversarial Networks
Marie-Pier Cote, Brian Hartman, Olivier Mercier, Joshua Meyers, Jared, Cummings, Elijah Harmon

TL;DR
This paper explores the use of three specialized GAN architectures to generate synthetic insurance datasets that preserve data structure and relationships while ensuring confidentiality, addressing privacy concerns in actuarial data sharing.
Contribution
It introduces and compares three tailored GAN models for multi-categorical insurance data, demonstrating their effectiveness in data synthesis and privacy preservation.
Findings
MC-WGAN-GP best reproduces original data structure
CTGAN is the easiest to use among the models
MNCDP-GAN guarantees differential privacy
Abstract
Due to confidentiality issues, it can be difficult to access or share interesting datasets for methodological development in actuarial science, or other fields where personal data are important. We show how to design three different types of generative adversarial networks (GANs) that can build a synthetic insurance dataset from a confidential original dataset. The goal is to obtain synthetic data that no longer contains sensitive information but still has the same structure as the original dataset and retains the multivariate relationships. In order to adequately model the specific characteristics of insurance data, we use GAN architectures adapted for multi-categorical data: a Wassertein GAN with gradient penalty (MC-WGAN-GP), a conditional tabular GAN (CTGAN) and a Mixed Numerical and Categorical Differentially Private GAN (MNCDP-GAN). For transparency, the approaches are illustrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutopsy Techniques and Outcomes
