Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

Gabriel Loiseau; Damien Sileo; Damien Riquet; Maxime Meyer; Marc Tommasi

arXiv:2603.29497·cs.CL·April 1, 2026

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi

PDF

1 Repo 2 Models

TL;DR

This paper presents a method to distill large language models' privacy assessment abilities into smaller, efficient models that maintain high agreement with human judgments, enabling scalable privacy evaluation.

Contribution

The authors develop lightweight models that replicate LLM privacy assessments, reducing computational costs while preserving accuracy, and validate their effectiveness across multiple domains.

Findings

01

Distilled models achieve high agreement with human privacy annotations.

02

Reduced model size from 675B to 150M parameters.

03

Effective as an evaluation metric for de-identification systems.

Abstract

Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gabrielloiseau/privacy-distillation
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.