BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics
Raunak Narwal, Syed Abbas

TL;DR
BIGBOY1.2 is a versatile synthetic data generator that creates realistic epidemic datasets with configurable features, aiding benchmarking and development of disease modeling and forecasting methods.
Contribution
It introduces a flexible framework for generating synthetic epidemic data with customizable parameters, supporting both traditional and machine learning models.
Findings
Generated datasets mimic real reporting artifacts.
Supports comparison of epidemiological and machine learning models.
Enables benchmarking with diverse epidemic scenarios.
Abstract
Modelling disease outbreak models remains challenging due to incomplete surveillance data, noise, and limited access to standardized datasets. We have created BIGBOY1.2, an open synthetic dataset generator that creates configurable epidemic time series and population-level trajectories suitable for benchmarking modelling, forecasting, and visualisation. The framework supports SEIR and SIR-like compartmental logic, custom seasonality, and noise injection to mimic real reporting artifacts. BIGBOY1.2 can produce datasets with diverse characteristics, making it suitable for comparing traditional epidemiological models (e.g., SIR, SEIR) with modern machine learning approaches (e.g., SVM, neural networks).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies · Data-Driven Disease Surveillance · Data Visualization and Analytics
