Fast and Optimal Differentially Private Frequent-Substring Mining

Peaker Guo; Rayne Holland; Hao Wu

arXiv:2603.09166·cs.DS·March 11, 2026

Fast and Optimal Differentially Private Frequent-Substring Mining

Peaker Guo, Rayne Holland, Hao Wu

PDF

Open Access

TL;DR

This paper introduces a scalable differentially private algorithm for frequent substring mining that significantly reduces computational complexity while maintaining near-optimal error guarantees.

Contribution

It presents a novel $$-differentially private algorithm with reduced space and time complexity, improving scalability over prior methods.

Findings

01

Achieves near-optimal error with lower computational costs.

02

Reduces space complexity to $O(n \u00ell + |\u03a3|)$.

03

Reduces time complexity to $O(n \u00ell \u2217 \u03a3 + |\u03a3|)$.

Abstract

Given a dataset of $n$ user-contributed strings, each of length at most $ℓ$ , a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $ε$ -differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O (n^{2} ℓ^{4})$ space and processing time. In this work, we present a new $ε$ -differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O (n ℓ + ∣Σ∣)$ and time complexity to $O (n ℓ lo g ∣Σ∣ + ∣Σ∣)$ , for input alphabet $Σ$ . Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Data Quality and Management