Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Sergei Krutikov (1); Bulat Khaertdinov (2); Rodion Kiriukhin (1); Shubham Agrawal (1); Mozhdeh Ariannezhad (1); Kees Jan De Vries (1) ((1) Booking.com; (2) Maastricht University)

arXiv:2405.13692·cs.LG·July 1, 2025

Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Sergei Krutikov (1), Bulat Khaertdinov (2), Rodion Kiriukhin (1), Shubham Agrawal (1), Mozhdeh Ariannezhad (1), Kees Jan De Vries (1) ((1) Booking.com, (2) Maastricht University)

PDF

Open Access

TL;DR

This study demonstrates that pre-trained tabular Transformers, utilizing self-supervised learning, can outperform classical Gradient Boosted Decision Trees in fraud detection tasks at Booking.com, both offline and online.

Contribution

The paper introduces a novel application of pre-trained tabular Transformers with SSL to fraud detection, surpassing GBDTs in accuracy and business metrics.

Findings

01

Transformers outperform GBDTs in offline Average Precision scores.

02

Pre-trained Transformers show significant online business metric improvements.

03

SSL enables Transformers to learn transferable representations from large datasets.

Abstract

Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Spam and Phishing Detection · Authorship Attribution and Profiling