Customs Import Declaration Datasets
Chaeyoon Jeong, Sundong Kim, Jaewoo Park, Yeonsoo Choi

TL;DR
This paper introduces a synthetic import declaration dataset generated with GANs to support research in trade risk management, enabling safe data sharing and benchmarking for fraud detection.
Contribution
The paper presents a new synthetic dataset for customs import declarations, facilitating research and benchmarking without compromising sensitive trade data.
Findings
Synthetic data closely mimics real trade data distribution
Baseline codes for fraud detection are provided
Advanced algorithms outperform simpler models in fraud detection
Abstract
Given the huge volume of cross-border flows, effective and efficient control of trade becomes more crucial in protecting people and society from illicit trade. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declaration dataset to facilitate the collaboration between domain experts in customs administrations and researchers from diverse domains, such as data science and machine learning. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with conditional tabular GAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
