Approximate DBSCAN under Differential Privacy
Yuan Qiu, Ke Yi

TL;DR
This paper introduces a novel differential privacy approach for DBSCAN clustering based on spans, providing better utility for visualization and classification, with a linear-time algorithm and verified experimental performance.
Contribution
Proposes a span-based DP-DBSCAN method that improves utility over label-based approaches, with a linear-time algorithm and matching theoretical bounds.
Findings
The span-based DP-DBSCAN outperforms label-based methods in utility.
The algorithm achieves linear time complexity in constant dimensions.
Experimental results confirm practical effectiveness on synthetic and real data.
Abstract
This paper revisits the DBSCAN problem under differential privacy (DP). Existing DP-DBSCAN algorithms aim at publishing the cluster labels of the input points. However, we show that both empirically and theoretically, this approach cannot offer any utility in the published results. We therefore propose an alternative definition of DP-DBSCAN based on the notion of spans. We argue that publishing the spans actually better serves the purposes of visualization and classification of DBSCAN. Then we present a linear-time DP-DBSCAN algorithm achieving the sandwich quality guarantee in any constant dimensions, as well as matching lower bounds on the approximation ratio. A key building block in our algorithm is a linear-time algorithm for constructing a histogram under pure-DP, which is of independent interest. Finally, we conducted experiments on both synthetic and real-world datasets to verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
