Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors
Sherly Alfonso-S\'anchez, Cristi\'an Bravo, Kristina G. Stankova

TL;DR
This study explores how geographic data from environmental sources and imagery can improve zone-level claim frequency models in motor insurance, demonstrating benefits of different data representations across various models.
Contribution
It introduces a framework for incorporating geographic information into actuarial models using alternative data sources and evaluates their impact on predictive accuracy.
Findings
Environmental features at 5 km scale improve model accuracy.
Image embeddings help when environmental features are unavailable.
Combining coordinates with environmental data yields the best predictions.
Abstract
Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this information can be incorporated and evaluated in claim-frequency models. This study examines how geographic information from alternative data sources can be incorporated into actuarial models for Motor Third Party Liability (MTPL) claim prediction under such constraints. Using the BeMTPL97 dataset, we adopt a zone-level modeling framework and evaluate predictive performance on unseen postcodes. Geographic information is introduced through two channels: environmental indicators from OpenStreetMap and CORINE Land Cover, and orthoimagery released by the Belgian National Geographic Institute for academic use. We evaluate the predictive contribution of coordinates, environmental features, and image embeddings across three baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
