Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering

Marco Pintore; Maura Pintor; Dimosthenis Karatzas; Battista Biggio

arXiv:2512.04554·cs.CV·December 5, 2025

Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering

Marco Pintore, Maura Pintor, Dimosthenis Karatzas, Battista Biggio

PDF

Open Access

TL;DR

This paper introduces a novel adversarial attack method that forges document content in a visually imperceptible way to induce incorrect answers in DocVQA models, revealing critical vulnerabilities.

Contribution

It presents specialized algorithms for creating visually imperceptible, semantically targeted document forgeries to test and expose vulnerabilities in state-of-the-art DocVQA models.

Findings

01

Effective attacks against Pix2Struct and Donut models

02

Vulnerabilities enable targeted misinformation and systematic failures

03

Highlights need for more robust DocVQA defenses

Abstract

Document Visual Question Answering (DocVQA) enables end-to-end reasoning grounded on information present in a document input. While recent models have shown impressive capabilities, they remain vulnerable to adversarial attacks. In this work, we introduce a novel attack scenario that aims to forge document content in a visually imperceptible yet semantically targeted manner, allowing an adversary to induce specific or generally incorrect answers from a DocVQA model. We develop specialized attack algorithms that can produce adversarially forged documents tailored to different attackers' goals, ranging from targeted misinformation to systematic model failure scenarios. We demonstrate the effectiveness of our approach against two end-to-end state-of-the-art models: Pix2Struct, a vision-language transformer that jointly processes image and text through sequence-to-sequence modeling, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Topic Modeling