Cross-Flow Correlations Survive Synthesis: Measuring Source-Level Privacy Leakage in Synthetic Network Traces
Minhao Jin, Hongyu H\`e, Maria Apostolaki

TL;DR
This paper reveals that synthetic network data generators leak source-level privacy through cross-flow correlations, exposing sensitive information despite existing privacy protections, and introduces a practical attack demonstrating this vulnerability.
Contribution
It uncovers a fundamental privacy leakage in synthetic network data generators and develops TraceBleed, the first source-level membership inference attack against them.
Findings
All tested generators leak source-level info on some datasets.
Differential privacy at flow or packet level does not ensure source privacy.
Increasing synthetic data volume amplifies privacy leakage.
Abstract
Synthetic network data generators (SynNetGens) are increasingly used to share realistic traffic traces without exposing sensitive raw data. While substantial effort has gone into improving fidelity, privacy is either assumed to be a built-in property of synthesis or addressed through differential privacy at the packet or flow level. This paper uncovers a fundamental privacy vulnerability: SynNetGens preserve cross-flow behavioral correlations that expose source-level membership, allowing an attacker to determine whether traffic of specific user, or service was included in the training data. This leakage arises from a mismatch in abstraction: existing SynNetGens operate and are protected at the packet or flow level, while sensitive information is encoded in correlations across flows from the same source. To demonstrate that this vulnerability is exploitable in practice, we develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
