Towards Quantifying The Privacy Of Redacted Text

Vaibhav Gusain; Douglas Leith

arXiv:2410.07772·cs.LG·October 11, 2024

Towards Quantifying The Privacy Of Redacted Text

Vaibhav Gusain, Douglas Leith

PDF

TL;DR

This paper introduces a method to quantify the privacy of redacted text by using deep learning to generate and analyze possible original texts, assessing privacy based on the diversity and similarity of reconstructed outputs.

Contribution

It presents a novel approach combining transformer-based models and k-anonymity concepts to evaluate privacy in redacted text, which was not previously addressed.

Findings

01

Effective reconstruction of original text from redacted versions.

02

Quantitative measure of privacy based on diversity of reconstructions.

03

Demonstrates the method's ability to assess privacy levels in redacted documents.

Abstract

In this paper we propose use of a k-anonymity-like approach for evaluating the privacy of redacted text. Given a piece of redacted text we use a state of the art transformer-based deep learning network to reconstruct the original text. This generates multiple full texts that are consistent with the redacted text, i.e. which are grammatical, have the same non-redacted words etc, and represents each of these using an embedding vector that captures sentence similarity. In this way we can estimate the number, diversity and quality of full text consistent with the redacted text and so evaluate privacy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.