Semantics-Preserved Distortion for Personal Privacy Protection in Information Management
Jiajia Li, Lu Yang, Letian Peng, Shitou Zhang, Ping Wang, Zuchao Li, and Hai Zhao

TL;DR
This paper introduces a linguistically-grounded text distortion method that preserves semantics while protecting personal privacy in NLP tasks, demonstrating effectiveness across multiple applications and attack scenarios.
Contribution
It proposes a novel Neighboring Distribution Divergence metric and two frameworks for semantic-preserving text distortion, advancing privacy protection techniques in NLP.
Findings
Effective in named entity recognition, constituency parsing, and machine reading comprehension.
Outperforms structural approaches in attribute attack resistance.
Limits sensitive data memorization in medical information management.
Abstract
In recent years, machine learning - particularly deep learning - has significantly impacted the field of information management. While several strategies have been proposed to restrict models from learning and memorizing sensitive information from raw texts, this paper suggests a more linguistically-grounded approach to distort texts while maintaining semantic integrity. To this end, we leverage Neighboring Distribution Divergence, a novel metric to assess the preservation of semantic meaning during distortion. Building on this metric, we present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. Our evaluations across various tasks, including named entity recognition, constituency parsing, and machine reading comprehension, affirm the plausibility and efficacy of our distortion technique in personal privacy protection. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
