Improving Machine Learning Performance with Synthetic Augmentation
Mel Sohm, Charles Dezons, Sami Sellami, Oscar Ninou, Axel Pincon

TL;DR
This paper analyzes the statistical effects of synthetic data augmentation in financial machine learning, revealing it can reduce variance but may introduce bias, depending on the regime.
Contribution
It formalizes the impact of synthetic augmentation as a bias-variance trade-off and introduces a null augmentation and permutation test to evaluate informational gains.
Findings
Synthetic augmentation benefits variance-dominant regimes like volatility forecasting.
It deteriorates performance in bias-dominant settings such as directional prediction.
Rare-regime targeting can improve domain-specific metrics but may conflict with permutation inference.
Abstract
Synthetic augmentation is increasingly used to mitigate data scarcity in financial machine learning, yet its statistical role remains poorly understood. We formalize synthetic augmentation as a modification of the effective training distribution and show that it induces a structural bias--variance trade-off: while additional samples may reduce estimation error, they may also shift the population objective whenever the synthetic distribution deviates from regions relevant under evaluation. To isolate informational gains from mechanical sample-size effects, we introduce a size-matched null augmentation and a finite-sample, non-parametric block permutation test that remains valid under weak temporal dependence. We evaluate this framework in both controlled Markov-switching environments and real financial datasets, including high-frequency option trade data and a daily equity panel.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
