Invisible Threats: Backdoor Attack in OCR Systems

Mauro Conti; Nicola Farronato; Stefanos Koffas; Luca Pajola; Stjepan; Picek

arXiv:2310.08259·cs.CR·October 13, 2023·1 cites

Invisible Threats: Backdoor Attack in OCR Systems

Mauro Conti, Nicola Farronato, Stefanos Koffas, Luca Pajola, Stjepan, Picek

PDF

Open Access

TL;DR

This paper demonstrates a backdoor attack on OCR systems that causes them to produce non-readable characters for malicious inputs, exposing vulnerabilities in state-of-the-art OCR models without degrading overall performance.

Contribution

It introduces a novel backdoor attack specific to OCR systems that injects non-readable characters, revealing a new security weakness in deep learning-based OCR.

Findings

01

Successfully causes non-readable output in 90% of poisoned instances

02

Does not affect overall OCR performance on clean data

03

Exposes vulnerability of OCR models to targeted backdoor attacks

Abstract

Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Handwritten Text Recognition Techniques