Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text
Oluwaseyi Feyisetan, Tom Diethe, Thomas Drake

TL;DR
This paper introduces a method using hyperbolic space representations to enhance text privacy while maintaining utility, providing strong privacy guarantees and minimal impact on downstream tasks.
Contribution
It proposes a novel hyperbolic space perturbation technique satisfying dx-privacy, improving privacy guarantees over Euclidean methods in text data.
Findings
Over 20x greater privacy guarantees compared to Euclidean baseline.
Effective protection against authorship attribution algorithms.
Minimal utility loss on downstream machine learning models.
Abstract
Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
