A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

Stephen Meisenbacher; Angelo Kleinert; and Florian Matthes

arXiv:2605.01065·cs.CL·May 5, 2026

A Systematic Exploration of Text Decomposition and Budget Distribution in Differentially Private Text Obfuscation

Stephen Meisenbacher, Angelo Kleinert, and Florian Matthes

PDF

TL;DR

This paper systematically evaluates how different text decomposition and privacy budget distribution methods impact the effectiveness of differentially private text obfuscation, highlighting the importance of design choices.

Contribution

It provides a comprehensive analysis of text chunking and budget allocation strategies, demonstrating their significant influence on privacy-utility trade-offs in DP text obfuscation.

Findings

01

Different decomposition and budget distribution methods lead to significantly different results.

02

Optimizing text chunking and budget allocation can improve empirical privacy-utility trade-offs.

03

Design choices in DP text obfuscation are crucial for achieving better performance.

Abstract

The goal of differentially private text obfuscation is to obfuscate, or "perturb", input texts with Differential Privacy (DP) guarantees, such that the private output texts are quantifiably indistinguishable from the originals. While perturbation at the word level is intuitive, meaningful text privatization happens on complete documents. Recent research has laid the groundwork for reasoning about privacy budget distribution, namely, how an overall $ε$ budget can be sensibly distributed among the component pieces of a text. We perform a systematic evaluation of multiple text decomposition and budget distribution techniques in the context of DP text obfuscation, testing how different methods for chunking texts can be combined with techniques for allocating $ε$ to these chunks. Our experiments reveal that such design choices are very important, as even with comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.