TL;DR
This paper explores the use of generative models, specifically TVAE and Gaussian Copula, to produce realistic synthetic flight data that can improve predictive models in aviation while addressing data confidentiality issues.
Contribution
It introduces a comprehensive framework for evaluating synthetic flight data quality and compares two generative models, highlighting their strengths and limitations.
Findings
Gaussian Copula achieves higher statistical similarity and fidelity.
TVAE handles large datasets efficiently and is scalable.
Synthetic data enables flight delay prediction with accuracy comparable to real data.
Abstract
The increasing adoption of synthetic data in aviation research offers a promising solution to data scarcity and confidentiality challenges. This study investigates the potential of generative models to produce realistic synthetic flight data and evaluates their quality through a comprehensive four-stage assessment framework. The need for synthetic flight data arises from their potential to serve as an alternative to confidential real-world records and to augment rare events in historical datasets. These enhanced datasets can then be used to train machine learning models that predict critical events, such as flight delays, cancellations, diversions, and turnaround times. Two generative models, Tabular Variational Autoencoder (TVAE) and Gaussian Copula (GC), are adapted to generate synthetic flight information and compared based on their ability to preserve statistical similarity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
