Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data

Bhawna Piryani; Jamshid Mozafari; Abdelrahman Abdallah; Antoine Doucet; Adam Jatowt

arXiv:2502.16781·cs.CL·September 22, 2025

Evaluating Robustness of LLMs in Question Answering on Multilingual Noisy OCR Data

Bhawna Piryani, Jamshid Mozafari, Abdelrahman Abdallah, Antoine Doucet, Adam Jatowt

PDF

1 Repo 1 Datasets

TL;DR

This paper analyzes how OCR errors in multilingual historical documents impact question-answering systems, introduces a new dataset, and evaluates the robustness of large language models under noisy conditions.

Contribution

It provides a comprehensive analysis of OCR noise effects on multilingual QA and introduces the MultiOCR-QA dataset for benchmarking.

Findings

01

QA performance drops significantly with OCR noise

02

Models are highly sensitive to different OCR error types

03

Current models lack robustness to noisy OCR data

Abstract

Optical Character Recognition (OCR) plays a crucial role in digitizing historical and multilingual documents, yet OCR errors - imperfect extraction of text, including character insertion, deletion, and substitution can significantly impact downstream tasks like question-answering (QA). In this work, we conduct a comprehensive analysis of how OCR-induced noise affects the performance of Multilingual QA Systems. To support this analysis, we introduce a multilingual QA dataset MultiOCR-QA, comprising 50K question-answer pairs across three languages, English, French, and German. The dataset is curated from OCR-ed historical documents, which include different levels and types of OCR noise. We then evaluate how different state-of-the-art Large Language Models (LLMs) perform under different error conditions, focusing on three major OCR error types. Our findings show that QA systems are highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

datascienceuibk/multiocr-qa
noneOfficial

Datasets

Bhawna/MultiOCR-QA
dataset· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.