DP-DocLDM: Differentially Private Document Image Generation using Latent Diffusion Models

Saifullah Saifullah; Stefan Agne; Andreas Dengel; Sheraz Ahmed

arXiv:2508.04208·cs.CR·August 7, 2025

DP-DocLDM: Differentially Private Document Image Generation using Latent Diffusion Models

Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed

PDF

TL;DR

This paper introduces DP-DocLDM, a method that uses differentially private latent diffusion models to generate synthetic document images, enhancing privacy and improving downstream classification performance.

Contribution

It proposes a novel combination of conditional latent diffusion models with differential privacy for synthetic document image generation, addressing privacy and performance issues.

Findings

01

Generates realistic, class-specific document images under strict privacy constraints.

02

Improves downstream classification accuracy on small datasets.

03

Effective across various document types and privacy levels.

Abstract

As deep learning-based, data-driven information extraction systems become increasingly integrated into modern document processing workflows, one primary concern is the risk of malicious leakage of sensitive private data from these systems. While some recent works have explored Differential Privacy (DP) to mitigate these privacy risks, DP-based training is known to cause significant performance degradation and impose several limitations on standard training procedures, making its direct application to downstream tasks both difficult and costly. In this work, we aim to address the above challenges within the context of document image classification by substituting real private data with a synthetic counterpart. In particular, we propose to use conditional latent diffusion models (LDMs) in combination with differential privacy (DP) to generate class-specific synthetic document images under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.