An engine to simulate insurance fraud network data
Bavo D.C. Campo, Katrien Antonio

TL;DR
This paper introduces a simulation engine that generates synthetic insurance fraud network data, addressing data scarcity and class imbalance issues to facilitate the development and testing of fraud detection models.
Contribution
The authors develop a customizable simulation tool that creates realistic synthetic insurance fraud data with controllable parameters for research and model validation.
Findings
Enables generation of large, realistic synthetic datasets
Facilitates testing of fraud detection methods under various scenarios
Addresses data scarcity and class imbalance challenges
Abstract
Traditionally, the detection of fraudulent insurance claims relies on business rules and expert judgement which makes it a time-consuming and expensive process (\'Oskarsd\'ottir et al., 2022). Consequently, researchers have been examining ways to develop efficient and accurate analytic strategies to flag suspicious claims. Feeding learning methods with features engineered from the social network of parties involved in a claim is a particularly promising strategy (see for example Van Vlasselaer et al. (2016); Tumminello et al. (2023)). When developing a fraud detection model, however, we are confronted with several challenges. The uncommon nature of fraud, for example, creates a high class imbalance which complicates the development of well performing analytic classification models. In addition, only a small number of claims are investigated and get a label, which results in a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Artificial Intelligence in Law · Machine Learning in Healthcare
