DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu

TL;DR
DPrivBench is a comprehensive benchmark designed to evaluate large language models' ability to reason about differential privacy guarantees across diverse, challenging scenarios, highlighting current limitations and guiding future improvements.
Contribution
This work introduces DPrivBench, the first benchmark for automated differential privacy reasoning using LLMs, covering broad topics and difficulty levels to assess and enhance model capabilities.
Findings
Strong models handle textbook mechanisms well
All models struggle with advanced algorithms
Identifies promising directions for improving automated DP reasoning
Abstract
Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
