# Building High-Quality Auction Fraud Dataset

**Authors:** Sulaf Elshaar, Samira Sadaoui

arXiv: 1906.04272 · 2019-08-26

## TL;DR

This paper presents the creation of a high-quality shill bidding dataset for online auction fraud detection, including new patterns and data preprocessing techniques to facilitate machine learning research.

## Contribution

It introduces a comprehensive, well-preprocessed dataset with novel shill bidding patterns, addressing data scarcity and complexity in auction fraud detection.

## Key findings

- Developed a large, high-quality SB dataset
- Introduced two new SB patterns
- Enhanced dataset quality by removing outliers

## Abstract

Given the magnitude of online auction transactions, it is difficult to safeguard consumers from dishonest sellers, such as shill bidders. To date, the application of Machine Learning Techniques (MLTs) to auction fraud has been limited, unlike their applications for combatting other types of fraud. Shill Bidding (SB) is a severe auction fraud, which is driven by modern-day technologies and clever scammers. The difficulty of identifying the behavior of sophisticated fraudsters and the unavailability of training datasets hinder the research on SB detection. In this study, we developed a high-quality SB dataset. To do so, first, we crawled and preprocessed a large number of commercial auctions and bidders' history as well. We thoroughly preprocessed both datasets to make them usable for the computation of the SB metrics. Nevertheless, this operation requires a deep understanding of the behavior of auctions and bidders. Second, we introduced two new SB pattern s and implemented other existing SB patterns. Finally, we removed outliers to improve the quality of training SB data.

---
Source: https://tomesphere.com/paper/1906.04272