Synthetic Test Data Generation Using Recurrent Neural Networks: A Position Paper
Razieh Behjati, Erik Arisholm, Chao Tan, Margrethe M. Bedregal

TL;DR
This paper discusses the importance of synthetic test data generation for quality assurance, compares anonymized and synthetic data, and explores using recurrent neural networks to generate realistic data, showing promising preliminary results.
Contribution
It introduces the use of recurrent neural networks for synthetic data generation in industrial testing environments, highlighting its potential and initial success.
Findings
Recurrent neural networks can generate representative synthetic data.
Preliminary experiments show high accuracy in data generation.
Synthetic data can serve as a privacy-preserving alternative to production data.
Abstract
Testing in production-like test environments is an essential part of quality assurance processes in many industries. Provisioning of such test environments, for information-intensive services, involves setting up databases that are rich-enough to enable simulating a wide variety of user scenarios. While production data is perhaps the gold-standard here, many organizations, particularly within the public sectors, are not allowed to use production data for testing purposes due to privacy concerns. The alternatives are to use anonymized data, or synthetically generated data. In this paper, we elaborate on these alternatives and compare them in an industrial context. Further we focus on synthetic data generation and investigate the use of recurrent neural networks for this purpose. In our preliminary experiments, we were able to generate representative and highly accurate data using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
