Differentially Private Substring and Document Counting with Near-Optimal Error
Giulia Bernardini, Philip Bille, Inge Li G{\o}rtz, Teresa Anna Steiner

TL;DR
This paper introduces differentially private data structures for substring and document counting in text databases, achieving near-optimal error bounds and advancing privacy-preserving data analysis techniques.
Contribution
It proposes new data structures with improved error bounds for private substring and document counting, and introduces a novel technique for private counting on trees.
Findings
Achieves $O( ext{polylog}(n ext{ell}| ext{Sigma}|))$ error for $ ext{epsilon}$-DP
Improves document counting error to $O( ext{sqrt}( ext{ell}) ext{polylog}(n ext{ell}| ext{Sigma}|))$ for $( ext{epsilon}, ext{delta})$-DP
Errors are proven to be near-optimal up to polylogarithmic factors
Abstract
For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database. Differential privacy is the gold standard for privacy in data analysis. It gives rigorous privacy guarantees, but comes at the cost of yielding less accurate results. In this paper, we carry out a theoretical study of substring and document counting under differential privacy. We propose a data structure storing -differentially private counts for all possible query patterns with a maximum additive error of , where is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Cellular Automata and Applications
