Privacy-Preserving Models for Legal Natural Language Processing

Ying Yin; Ivan Habernal

arXiv:2211.02956·cs.CL·August 14, 2025

Privacy-Preserving Models for Legal Natural Language Processing

Ying Yin, Ivan Habernal

PDF

Open Access 1 Repo

TL;DR

This paper explores how to pre-train legal domain transformer models with differential privacy, balancing data privacy with improved performance on legal NLP tasks.

Contribution

It introduces the novel application of differential privacy to large-scale legal NLP model pre-training, enhancing privacy without sacrificing downstream task performance.

Findings

01

Differential privacy can be effectively integrated into legal NLP model pre-training.

02

Proper training configurations improve downstream performance while maintaining privacy.

03

The approach is the first to apply differential privacy at this scale in legal NLP.

Abstract

Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trusthlt/privacy-legal-nlp-lm
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data