PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction

Saifullah Saifullah (1; 2); Stefan Agne (2; 3); Andreas Dengel; (1; 2); Sheraz Ahmed (2; 3) ((1) Department of Computer Science,; University of Kaiserslautern-Landau; Kaiserslautern; Rhineland-Palatinate,; Germany; (2) German Research Center for Artificial Intelligence; DFKI GmbH,; Kaiserslautern; Rhineland-Palatinate; Germany; (3) DeepReader GmbH,; Kaiserlautern; Germany)

arXiv:2310.03777·cs.CL·October 9, 2023

PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction

Saifullah Saifullah (1, 2), Stefan Agne (2, 3), Andreas Dengel, (1, 2), Sheraz Ahmed (2, 3) ((1) Department of Computer Science,, University of Kaiserslautern-Landau, Kaiserslautern, Rhineland-Palatinate,, Germany, (2) German Research Center for Artificial Intelligence

PDF

Open Access

TL;DR

This paper explores privacy-preserving key information extraction from documents using large foundation models combined with differential privacy and federated learning, introducing a new DP-FL algorithm and practical guidelines.

Contribution

It introduces FeAm-DP, a novel DP-FL algorithm for scalable privacy-preserving document KIE, and provides practical guidelines for balancing privacy and utility.

Findings

01

Large document foundation models can be effectively fine-tuned under privacy constraints.

02

FeAm-DP achieves comparable performance to standalone DP in federated settings.

03

The study offers practical guidelines for privacy-utility trade-offs in private KIE.

Abstract

In this paper, we introduce strategies for developing private Key Information Extraction (KIE) systems by leveraging large pretrained document foundation models in conjunction with differential privacy (DP), federated learning (FL), and Differentially Private Federated Learning (DP-FL). Through extensive experimentation on six benchmark datasets (FUNSD, CORD, SROIE, WildReceipts, XFUND, and DOCILE), we demonstrate that large document foundation models can be effectively fine-tuned for the KIE task under private settings to achieve adequate performance while maintaining strong privacy guarantees. Moreover, by thoroughly analyzing the impact of various training and model parameters on model performance, we propose simple yet effective guidelines for achieving an optimal privacy-utility trade-off for the KIE task under global DP. Finally, we introduce FeAm-DP, a novel DP-FL algorithm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data