DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang; Pengrun Huang; Eli Chien; Om Thakkar; Kamalika Chaudhuri; Yu-Xiang Wang; Ruihan Wu

arXiv:2604.15851·cs.LG·May 19, 2026

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar, Kamalika Chaudhuri, Yu-Xiang Wang, Ruihan Wu

PDF

1 Datasets

TL;DR

DPrivBench is a comprehensive benchmark designed to evaluate large language models' ability to reason about differential privacy guarantees across diverse, challenging scenarios, highlighting current limitations and guiding future improvements.

Contribution

This work introduces DPrivBench, the first benchmark for automated differential privacy reasoning using LLMs, covering broad topics and difficulty levels to assess and enhance model capabilities.

Findings

01

Strong models handle textbook mechanisms well

02

All models struggle with advanced algorithms

03

Identifies promising directions for improving automated DP reasoning

Abstract

Differential privacy (DP) has a wide range of applications for protecting data privacy, but designing and verifying DP algorithms requires expert-level reasoning, creating a high barrier for non-expert practitioners. Prior works either rely on specialized verification languages that demand substantial domain expertise or remain semi-automated and require human-in-the-loop guidance. In this work, we investigate whether large language models (LLMs) can automate DP reasoning. We introduce DPrivBench, a benchmark in which each instance asks whether a function or algorithm satisfies a stated DP guarantee under specified assumptions. The benchmark is carefully designed to cover a broad range of DP topics, span diverse difficulty levels, and resist shortcut reasoning through trivial pattern matching. Experiments show that while the strongest models handle textbook mechanisms well, all models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

erchiw/DPriv-Bench
dataset· 196 dl
196 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.