# Dataset shift quantification for credit card fraud detection

**Authors:** Yvan Lucas, Pierre-Edouard Portier, L\'ea Laporte, Sylvie Calabretto,, Liyun He-Guelton, Frederic Obl\'e, Michael Granitzer

arXiv: 1906.06977 · 2019-06-18

## TL;DR

This paper introduces a method to quantify daily dataset shift in credit card transactions by classifying days against each other, revealing shifts related to calendar events, and demonstrates that incorporating this shift improves fraud detection accuracy.

## Contribution

The paper presents a novel approach to measure and incorporate dataset shift in credit card fraud detection, linking shifts to calendar events and enhancing detection performance.

## Key findings

- Dataset shift correlates with calendar events like holidays and weekends.
- Classifying days against each other effectively quantifies dataset shift.
- Incorporating shift information as a feature improves fraud detection accuracy.

## Abstract

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.06977/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.06977/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/1906.06977/full.md

---
Source: https://tomesphere.com/paper/1906.06977