DocMIA: Document-Level Membership Inference Attacks against DocVQA   Models

Khanh Nguyen; Raouf Kerkouche; Mario Fritz; Dimosthenis Karatzas

arXiv:2502.03692·cs.LG·February 7, 2025

DocMIA: Document-Level Membership Inference Attacks against DocVQA Models

Khanh Nguyen, Raouf Kerkouche, Mario Fritz, Dimosthenis Karatzas

PDF

Open Access 1 Repo

TL;DR

This paper introduces two novel membership inference attacks against DocVQA models, revealing significant privacy vulnerabilities in document understanding systems by outperforming existing methods in both white-box and black-box scenarios.

Contribution

The paper presents the first tailored membership inference attacks for DocVQA models, applicable in both white-box and black-box settings without auxiliary data.

Findings

01

Attacks outperform existing methods across multiple datasets

02

Both attack scenarios demonstrate high effectiveness and privacy risks

03

Unsupervised methods work well without auxiliary datasets

Abstract

Document Visual Question Answering (DocVQA) has introduced a new paradigm for end-to-end document understanding, and quickly became one of the standard benchmarks for multimodal LLMs. Automating document processing workflows, driven by DocVQA models, presents significant potential for many business sectors. However, documents tend to contain highly sensitive information, raising concerns about privacy risks associated with training such DocVQA models. One significant privacy vulnerability, exploited by the membership inference attack, is the possibility for an adversary to determine if a particular record was part of the model's training data. In this paper, we introduce two novel membership inference attacks tailored specifically to DocVQA models. These attacks are designed for two different adversarial scenarios: a white-box setting, where the attacker has full access to the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khanhnguyen21006/mia_docvqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing