Privacy-Enhanced Database Synthesis for Benchmark Publishing (Technical Report)
Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Yongrui Zhong, Bo Tang,, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

TL;DR
This paper presents PrivBench, a differentially private database synthesis framework using sum-product networks, designed to generate high-fidelity benchmark databases that preserve data distribution, query performance, and privacy.
Contribution
The paper introduces PrivBench, a novel SPN-based framework for differentially private synthesis of complex multi-relation databases for benchmarking.
Findings
PrivBench maintains data distribution fidelity.
PrivBench preserves query runtime performance.
PrivBench ensures database-level differential privacy.
Abstract
Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating synthesized databases for benchmarking that also prioritize privacy protection. Differential privacy (DP)-based data synthesis has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or downstream ML tasks, with less attention given to benchmarking factors like query runtime performance. This paper delves into differentially private database synthesis specifically for benchmark publishing scenarios, aiming to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Data Storage Technologies · Advanced Database Systems and Queries
