Privacy-Preserving Models for Legal Natural Language Processing
Ying Yin, Ivan Habernal

TL;DR
This paper explores how to pre-train legal domain transformer models with differential privacy, balancing data privacy with improved performance on legal NLP tasks.
Contribution
It introduces the novel application of differential privacy to large-scale legal NLP model pre-training, enhancing privacy without sacrificing downstream task performance.
Findings
Differential privacy can be effectively integrated into legal NLP model pre-training.
Proper training configurations improve downstream performance while maintaining privacy.
The approach is the first to apply differential privacy at this scale in legal NLP.
Abstract
Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
