Fast and Optimal Differentially Private Frequent-Substring Mining
Peaker Guo, Rayne Holland, Hao Wu

TL;DR
This paper introduces a scalable differentially private algorithm for frequent substring mining that significantly reduces computational complexity while maintaining near-optimal error guarantees.
Contribution
It presents a novel $$-differentially private algorithm with reduced space and time complexity, improving scalability over prior methods.
Findings
Achieves near-optimal error with lower computational costs.
Reduces space complexity to $O(n \u00ell + |\u03a3|)$.
Reduces time complexity to $O(n \u00ell \u2217 \u03a3 + |\u03a3|)$.
Abstract
Given a dataset of user-contributed strings, each of length at most , a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a -differentially private algorithm achieving near-optimal error, but at the prohibitive cost of space and processing time. In this work, we present a new -differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to and time complexity to , for input alphabet . Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Data Quality and Management
