Protecting User Prompts Via Character-Level Differential Privacy

Shashie Dilhara Batan Arachchige; Hassan Jameel Asghar; Benjamin Zi Hao Zhao; Dinusha Vatsalan; Dali Kaafar

arXiv:2603.26032·cs.CR·March 30, 2026

Protecting User Prompts Via Character-Level Differential Privacy

Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan, Dali Kaafar

PDF

TL;DR

This paper introduces a character-level differential privacy method to sanitize user prompts for LLMs, effectively protecting sensitive information while maintaining utility for downstream tasks.

Contribution

The authors propose a novel character-level perturbation mechanism using differential privacy that obfuscates sensitive words without explicit PII detection.

Findings

01

Sensitive PII is reconstructed at near-random rates.

02

Non-sensitive words are reconstructed with high accuracy.

03

The method maintains good privacy-utility balance.

Abstract

Large Language Models (LLMs) generate responses based on user prompts. Often, these prompts may contain highly sensitive information, including personally identifiable information (PII), which could be exposed to third parties hosting these models. In this work, we propose a new method to sanitize user prompts. Our mechanism uses the randomized response mechanism of differential privacy to randomly and independently perturb each character in a word. The perturbed text is then sent to a remote LLM, which first performs a prompt restoration and subsequently performs the intended downstream task. The idea is that the restoration will be able to reconstruct non-sensitive words even when they are perturbed due to cues from the context, as well as the fact that these words are often very common. On the other hand, perturbation would make reconstruction of sensitive words difficult because…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.