Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering
Anthony Mudet, Souhail Bakkali

TL;DR
This paper presents a multilingual retrieval-augmented generation system tailored for question answering on noisy historical documents, effectively handling OCR errors, language variation, and temporal language drift.
Contribution
It introduces a modular, robust pipeline combining semantic query expansion, evidence-grounded generation prompts, and systematic component evaluation for improved historical document QA.
Findings
Enhanced retrieval robustness via Reciprocal Rank Fusion.
Faithful answers with explicit abstention for unanswerable questions.
Stable recall performance across query variations.
Abstract
Large-scale digitization initiatives have unlocked massive collections of historical newspapers, yet effective computational access remains hindered by OCR corruption, multilingual orthographic variation, and temporal language drift. We develop and evaluate a multilingual Retrieval-Augmented Generation pipeline specifically designed for question answering on noisy historical documents. Our approach integrates: (i) semantic query expansion and multi-query fusion using Reciprocal Rank Fusion to improve retrieval robustness against vocabulary mismatch; (ii) a carefully engineered generation prompt that enforces strict grounding in retrieved evidence and explicit abstention when evidence is insufficient; and (iii) a modular architecture enabling systematic component evaluation. We conduct comprehensive ablation studies on Named Entity Recognition and embedding model selection, demonstrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks
