NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
Shuo Huang, William MacLean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu

TL;DR
This paper introduces NAP^2, a benchmark dataset for privacy-preserving text rewriting that mimics human strategies, aiming to improve naturalness and privacy protection in text sanitization.
Contribution
The paper presents the first curated corpus, NAP^2, for naturalness and privacy-preserving text rewriting, developed through crowdsourcing and LLMs, enhancing privacy protection while maintaining text utility.
Findings
Human-inspired rewriting yields more natural text.
The approach balances privacy and utility effectively.
Extensive experiments demonstrate improved privacy protection.
Abstract
The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP^2, through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments. Researchers interested in accessing the dataset are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Digital and Cyber Forensics
