LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

arXiv:2604.23795·cs.CR·April 28, 2026

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule

PDF

TL;DR

This paper introduces LLM-CEG, a framework for privacy auditing of large language models that balances privacy and utility using empirical measures and differential privacy tuning.

Contribution

It extends the CEG framework to LLMs, proposing a systematic privacy auditing method with a prototype demonstrating significant privacy and utility improvements.

Findings

01

DP-SGD reduces MIA attacker advantage by 71.5%.

02

DP-SGD improves out-of-distribution utility by 47-50%.

03

Differential privacy acts as implicit regularization in LLM fine-tuning.

Abstract

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjusting differential privacy parameters until both thresholds are jointly satisfied. A proof-of-concept prototype fine-tunes DistilGPT-2 on a synthetic clinical PII dataset under four privacy regimes using DP-SGD. Results indicate that DP-SGD reduces MIA attacker advantage by 71.5% while simultaneously improving out-of-distribution utility by 47-50% relative to the overfitted baseline, suggesting that differential privacy may act as implicit regularization under narrow fine-tuning conditions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.