On the privacy-utility trade-off in differentially private hierarchical text classification
Dominik Wunderlich, Daniel Bernau, Francesco Ald\`a, Javier, Parra-Arnau, Thorsten Strufe

TL;DR
This paper explores how different neural network architectures for hierarchical text classification balance privacy and utility when trained with differential privacy, highlighting that larger privacy parameters can effectively prevent data leakage with minimal utility loss.
Contribution
It empirically compares neural network architectures under differential privacy, identifying models that offer optimal privacy-utility trade-offs for different dataset sizes and text lengths.
Findings
Large privacy parameters mitigate membership inference attacks effectively.
Transformer models perform well on large, long-text datasets.
CNNs are preferable for smaller datasets with shorter texts.
Abstract
Hierarchical text classification consists in classifying text documents into a hierarchy of classes and sub-classes. Although artificial neural networks have proved useful to perform this task, unfortunately they can leak training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models, enabling the models to be shared safely at the cost of reduced model accuracy. This work investigates the privacy-utility trade-off in hierarchical text classification with differential privacy guarantees, and identifies neural network architectures that offer superior trade-offs. To this end, we use a white-box membership inference attack to empirically assess the information leakage of three widely used neural network architectures. We show that large differential privacy parameters already…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
