Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events
Jingyi Gu, Xuan Zhang, Guiling Wang

TL;DR
This survey reviews synthetic data generation methods tailored for rare and extreme events, highlighting their unique challenges, evaluation metrics, and application domains to guide future research in this critical area.
Contribution
It provides the first comprehensive overview of synthetic data generation techniques specifically for extreme events, including evaluation frameworks and domain-specific insights.
Findings
Summarizes generative modeling techniques for heavy-tailed distributions
Introduces a tailored evaluation framework for extreme event data
Identifies underexplored application domains like wildfires and pandemics
Abstract
Extreme events, such as market crashes, natural disasters, and pandemics, are rare but catastrophic, often triggering cascading failures across interconnected systems. Accurate prediction and early warning can help minimize losses and improve preparedness. While data-driven methods offer powerful capabilities for extreme event modeling, they require abundant training data, yet extreme event data is inherently scarce, creating a fundamental challenge. Synthetic data generation has emerged as a powerful solution. However, existing surveys focus on general data with privacy preservation emphasis, rather than extreme events' unique performance requirements. This survey provides the first overview of synthetic data generation for extreme events. We systematically review generative modeling techniques and large language models, particularly those enhanced by statistical theory as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSeismology and Earthquake Studies · Anomaly Detection Techniques and Applications · Data-Driven Disease Surveillance
MethodsFocus
