Exploring the Landscape for Generative Sequence Models for Specialized   Data Synthesis

Mohammad Zbeeb; Mohammad Ghorayeb; Mariam Salman

arXiv:2411.01929·cs.LG·December 20, 2024

Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Mohammad Zbeeb, Mohammad Ghorayeb, Mariam Salman

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel method for generating high-quality synthetic malicious network traffic data by transforming numerical data into text, leveraging multiple generative models to improve generalization and data fidelity.

Contribution

It introduces a unique approach that frames data synthesis as a language modeling task, outperforming existing models in generating complex structured data.

Findings

01

Our method surpasses state-of-the-art models in data fidelity.

02

Transforming data into text enhances regularization and generalization.

03

Open-source code and models facilitate further research.

Abstract

Artificial Intelligence (AI) research often aims to develop models that can generalize reliably across complex datasets, yet this remains challenging in fields where data is scarce, intricate, or inaccessible. This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize one of the most demanding structured datasets: Malicious Network Traffic. Our approach uniquely transforms numerical data into text, re-framing data generation as a language modeling task, which not only enhances data regularization but also significantly improves generalization and the quality of the synthetic data. Extensive statistical analyses demonstrate that our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data. Additionally, we conduct a comprehensive study on synthetic data applications, effectiveness, and evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moe-zbeeb/exploring-the-landscape-for-generative-models-for-specialized-data-generation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries