Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers
Mark A. Burgess, Brendan Hosking, Roc Reguant, Anubhav Kaphle,, Mitchell J. O'Brien, Letitia M.F. Sng, Yatish Jain, Denis C. Bauer

TL;DR
This paper introduces Genomator, a logic-based method for generating private, accurate, and scalable synthetic genomic data, outperforming existing techniques in accuracy, privacy, and efficiency, with potential clinical applications.
Contribution
The paper presents Genomator, a novel logic-solving approach that significantly improves privacy, accuracy, and scalability in synthetic genomic data generation compared to state-of-the-art methods.
Findings
Genomator achieves 84-93% accuracy improvement over existing methods.
Genomator provides 95-98% higher privacy levels.
It scales to whole genomes, being 1000-1600 times more efficient.
Abstract
Machine-generated data is a valuable resource for training Artificial Intelligence algorithms, evaluating rare workflows, and sharing data under stricter data legislations. The challenge is to generate data that is accurate and private. Current statistical and deep learning methods struggle with large data volumes, are prone to hallucinating scenarios incompatible with reality, and seldom quantify privacy meaningfully. Here we introduce Genomator, a logic solving approach (SAT solving), which efficiently produces private and realistic representations of the original data. We demonstrate the method on genomic data, which arguably is the most complex and private information. Synthetic genomes hold great potential for balancing underrepresented populations in medical research and advancing global data exchange. We benchmark Genomator against state-of-the-art methodologies (Markov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Security and Verification in Computing · Parallel Computing and Optimization Techniques
MethodsRestricted Boltzmann Machine
