PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback
Dehao Yuan, Tyler Farnan, Stefan Tesliuc, Doron L Bergman, Yulun Wu, Xiaoyu Liu, Minghui Liu, James Montgomery, Nam H Nguyen, C. Bayan Bruss, Furong Huang

TL;DR
PersonaLedger is a novel generation engine that combines large language models conditioned on user personas with rule-based systems to produce diverse, realistic, and privacy-preserving financial transaction data for research and evaluation.
Contribution
It introduces a hybrid approach integrating LLMs and rule-based engines to generate realistic financial transactions while maintaining correctness and privacy.
Findings
Created a dataset of 30 million transactions from 23,000 users.
Developed benchmark tasks for illiquidity classification and identity theft detection.
Provided a resource supporting evaluation of forecasting and anomaly detection models.
Abstract
Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve behavioral diversity and logical groundedness. Rule-driven simulators rely on hand-crafted workflows and shallow stochasticity, which miss the richness of human behavior. Learning-based generators such as GANs capture correlations yet often violate hard financial constraints and still require training on private data. We introduce PersonaLedger, a generation engine that uses a large language model conditioned on rich user personas to produce diverse transaction streams, coupled with an expert configurable programmatic engine that maintains correctness. The LLM and engine interact in a closed loop: after each event, the engine updates the user state, enforces financial rules, and returns a context aware…
Peer Reviews
Decision·Submitted to ICLR 2026
- The framework developed to generate the data looks useful as an artifact. There is thoughtful design on the abstractions in the engine orchestrating the LLM calls. In particular, the interface for adding rules is well thought out. - The writing of the paper is very good. The authors thoroughly motivate the problem at hand, discuss issues with naive solutions, and give a very clear exposition of the structure and characteristics of the dataset release. - The data resource is of high quality.
- There is a lack of concrete evaluation criteria proposed to assess the fidelity of the proposed dataset with respect to a real financial transaction dataset. Arguments of realism are mostly qualitative, or pertain to an arbitrarily picked attribute. - More evidence of the usefulness of the dataset would be appreciated. For the downstream task benchmarks, it would be good to assess whether the synthetic-benchmark-induced rankings of methods align with the ranking of methods on real tasks; alte
1. Timely contribution: The lack of publicly available transaction data due to privacy restrictions is a real bottleneck in financial AI research. The proposed approach represents an ambitious and creative attempt to overcome this constraint while maintaining logical consistency. 2. Interesting idea: The LLM + rule engine closed loop is very interesting. It directly addresses the brittleness of rule-based simulators and the constraint violations of purely generative models (e.g., GANs or VAEs).
1. Insufficient validation of realism: The main limitation lies in the lack of quantitative or external validation demonstrating that the generated data are truly realistic or useful proxies for real-world ledgers. Statistical diversity and rule adherence are necessary but not sufficient. Without comparison to real transaction datasets (even if at an aggregated or stylized level), it is hard to judge whether the synthetic data exhibit realistic interdependencies or temporal dynamics. 2. Shallow
Excellent work, great creativity and idea, congratulations on realizing it! Clear methodology, closed loop: LLM ensures diversity, rule engine strictly controls accounting and calendar constraints, errors can be corrected through structured prompts. Complete resources: Large data scale with comprehensive fields, plus two tasks closely related to risk control/anti-fraud with a unified protocol. Broad baseline coverage: Compares Transformer, PatchTST, Autoformer, iTransformer, etc. under the same
While the paper provides statistical analysis (Section 2.2) and benchmark tasks (Section 3), there is no systematic evaluation of whether the generated transactions are actually realistic compared to real financial data.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Machine Learning in Healthcare · Topic Modeling
