Synthetic Financial Data Generation for Enhanced Financial Modelling
Christophe D. Hounwanou, Yae Ulrich Gaba, Pierre Ntakirutimana

TL;DR
This paper evaluates three generative models for synthetic financial data, highlighting TimeGAN's superior realism and providing guidelines for model selection based on application needs.
Contribution
It introduces a unified evaluation framework for synthetic financial data and compares ARIMA-GARCH, VAEs, and TimeGAN, offering practical guidelines for model choice.
Findings
TimeGAN achieves the lowest MMD score, indicating high fidelity.
ARIMA-GARCH captures linear trends but not nonlinear dynamics.
VAEs underestimate extreme events in generated data.
Abstract
Data scarcity and confidentiality in finance often impede model development and robust testing. This paper presents a unified multi-criteria evaluation framework for synthetic financial data and applies it to three representative generative paradigms: the statistical ARIMA-GARCH baseline, Variational Autoencoders (VAEs), and Time-series Generative Adversarial Networks (TimeGAN). Using historical S and P 500 daily data, we evaluate fidelity (Maximum Mean Discrepancy, MMD), temporal structure (autocorrelation and volatility clustering), and practical utility in downstream tasks, specifically mean-variance portfolio optimization and volatility forecasting. Empirical results indicate that ARIMA-GARCH captures linear trends and conditional volatility but fails to reproduce nonlinear dynamics; VAEs produce smooth trajectories that underestimate extreme events; and TimeGAN achieves the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Machine Learning in Healthcare · Time Series Analysis and Forecasting
