AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering
Johan \"Ostman, Edvin Callisen, Anton Chen, Kristiina Ausmees, Emanuel G{\aa}rdh, Jovan Zamac, Jolanta Goldsteine, Hugo Wefer, Simon Whelan, Markus Reimeg{\aa}rd

TL;DR
AMLgentex is an open-source tool that generates realistic synthetic transaction data to help develop and evaluate anti-money laundering detection methods under real-world conditions.
Contribution
It introduces AMLGentex, a configurable data generator and benchmarking suite addressing limitations of existing datasets for money laundering detection.
Findings
Provides realistic, configurable synthetic datasets
Enables systematic evaluation of detection methods
Supports collaboration with country-specific data
Abstract
Money laundering enables organized crime by moving illicit funds into the legitimate economy. Although trillions of dollars are laundered each year, detection rates remain low because launderers evade oversight, confirmed cases are rare, and institutions see only fragments of the global transaction network. Since access to real transaction data is tightly restricted, synthetic datasets are essential for developing and evaluating detection methods. However, existing datasets fall short: they often neglect partial observability, temporal dynamics, strategic behavior, uncertain labels, class imbalance, and network-level dependencies. We introduce AMLGentex, an open-source suite for generating realistic, configurable transaction data and benchmarking detection methods. AMLGentex enables systematic evaluation of anti-money laundering systems under conditions that mirror real-world…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper tackles a genuine barrier in AML research - the scarcity of publicly available transaction datasets due to privacy and regulatory constraints. This data scarcity significantly limits academic research and development of new detection methods. - The paper provides substantial improvements over existing simulators. - The framework offers two operation modes (knowledge-free and data-informed), making it accessible to researchers without proprietary data while allowing financial instit
W1: The paper's central claim is generating "realistic, configurable transaction data," but there is no empirical evidence demonstrating that synthetic data reflects real-world AML patterns: - No comparison of statistical properties (degree distributions, transaction amounts, temporal patterns) between synthetic and real data; - No fidelity metrics (e.g., Maximum Mean Discrepancy, Wasserstein distance, or feature distribution comparisons); - No demonstration that models trained on synthetic data
- Clear parameter-tuning strategy via multi-objective Bayesian optimization with two modes (knowledge-free and data-informed). - The open-source release intent and detailed hyperparameter configurations promote reproducibility. - Paper is clearly written and easy to follow.
- The work extends AMLSim and aims to address some of its limitations; however, several of these have already been resolved in more recent simulators (e.g., AMLWorld by Altman et al.). Comparisons should therefore include these newer data generation models. - The abstract overstates the shortcomings of existing datasets. - The uniform feature-importance (Section 5) objective lacks ablations or justification. - The baselines in the experimental evaluation do not consider more sophisticated AML-sp
The authors have provided an extensive method for constructing synthetic transaction networks that can be used to test AML methods. The payment patterns change over time, allowing to mimic cyclical behaviour during the months. The flexibility that is built-in using the hyperparameters give the user the ability to alter distributions to their specific needs. This last point is also illustrated in the paper using some country-specific statistics. The network also includes a source and sink no
In general, I have the feeling that the presentation of the method in the main text is a bit too qualitative. To fully understand what the authors mean or how the method is implemented, I had to go to the appendix often. I am of the opinion that the main text should be self-contained enough for the reader to understand the main points of the method. The text itself is also vague in arguing what the added value is. This is not clearly stated in the introduction, and also the conclusion is a bit
1. Money laundering identification is a high-risk, cross-border, and cross-modal data governance challenge. The paper's inclusion of this as an AI research topic is highly significant. 2.This paper constructs a processing platform, encompassing multimodal data indexing, graph construction, agent interaction, and model-driven development, with a complete overall pipeline.
1. This article is more like a systems engineering report or project white paper and does not offer original technical contributions to the AML detection algorithm, model optimization, or learning mechanism. For example: No clear algorithmic formula derivation, No new loss or model structure, RAG partially reuses standard techniques. 2. Experimental validation is very limited. The experimental section primarily consists of graphs, lacking benchmark-based quantitative results (such as comparisons
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCrime, Illicit Activities, and Governance
