TL;DR
This paper presents a three-stage machine learning approach to generate a large synthetic driver telematics dataset, closely resembling real insurance data, to facilitate risk modeling and algorithm development.
Contribution
It introduces a novel method combining neural networks and extended SMOTE to produce realistic synthetic telematics data for insurance risk assessment.
Findings
Synthetic dataset closely matches real data statistics.
Neural networks effectively simulate claims and amounts.
Extended SMOTE generates diverse feature portfolios.
Abstract
This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations about driver's claims experience together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can be used to advance models to assess risks for usage-based insurance. It follows a three-stage process using machine learning algorithms. The first stage is simulating values for the number of claims as multiple binary classifications applying feedforward neural networks. The second stage is simulating values for aggregated amount of claims as regression using feedforward neural networks, with number of claims included in the set of feature variables. In the final stage, a synthetic portfolio of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
