PDF Retrieval Augmented Question Answering

Thi Thu Uyen Hoang; Meenakshi Rajendran; Kun Zhang; Yuhan Wu; Viet Anh Nguyen

arXiv:2506.18027·cs.CL·April 8, 2026

PDF Retrieval Augmented Question Answering

Thi Thu Uyen Hoang, Meenakshi Rajendran, Kun Zhang, Yuhan Wu, Viet Anh Nguyen

PDF

TL;DR

This paper advances PDF-based question-answering by integrating multimodal data like images and tables into a retrieval-augmented generation framework, improving information extraction from complex documents.

Contribution

It develops a comprehensive RAG-based QA system that effectively processes and integrates non-textual PDF elements for accurate, multimodal question answering.

Findings

01

Demonstrates improved accuracy in extracting information from PDFs with diverse content

02

Effectively processes multimodal data including images, diagrams, and tables

03

Provides an experimental evaluation validating the system's performance

Abstract

This paper presents an advancement in Question-Answering (QA) systems using a Retrieval Augmented Generation (RAG) framework to enhance information extraction from PDF files. Recognizing the richness and diversity of data within PDFs--including text, images, vector diagrams, graphs, and tables--poses unique challenges for existing QA systems primarily designed for textual content. We seek to develop a comprehensive RAG-based QA system that will effectively address complex multimodal questions, where several data types are combined in the query. This is mainly achieved by refining approaches to processing and integrating non-textual elements in PDFs into the RAG framework to derive precise and relevant answers, as well as fine-tuning large language models to better adapt to our system. We provide an in-depth experimental evaluation of our solution, demonstrating its capability to extract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.