Customs Import Declaration Datasets

Chaeyoon Jeong; Sundong Kim; Jaewoo Park; Yeonsoo Choi

arXiv:2208.02484·cs.LG·September 6, 2023·6 cites

Customs Import Declaration Datasets

Chaeyoon Jeong, Sundong Kim, Jaewoo Park, Yeonsoo Choi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a synthetic import declaration dataset generated with GANs to support research in trade risk management, enabling safe data sharing and benchmarking for fraud detection.

Contribution

The paper presents a new synthetic dataset for customs import declarations, facilitating research and benchmarking without compromising sensitive trade data.

Findings

01

Synthetic data closely mimics real trade data distribution

02

Baseline codes for fraud detection are provided

03

Advanced algorithms outperform simpler models in fraud detection

Abstract

Given the huge volume of cross-border flows, effective and efficient control of trade becomes more crucial in protecting people and society from illicit trade. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declaration dataset to facilitate the collaboration between domain experts in customs administrations and researchers from diverse domains, such as data science and machine learning. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with conditional tabular GAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seondong/customs-declaration-datasets
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques