PDF Retrieval Augmented Question Answering
Thi Thu Uyen Hoang, Meenakshi Rajendran, Kun Zhang, Yuhan Wu, Viet Anh Nguyen

TL;DR
This paper advances PDF-based question-answering by integrating multimodal data like images and tables into a retrieval-augmented generation framework, improving information extraction from complex documents.
Contribution
It develops a comprehensive RAG-based QA system that effectively processes and integrates non-textual PDF elements for accurate, multimodal question answering.
Findings
Demonstrates improved accuracy in extracting information from PDFs with diverse content
Effectively processes multimodal data including images, diagrams, and tables
Provides an experimental evaluation validating the system's performance
Abstract
This paper presents an advancement in Question-Answering (QA) systems using a Retrieval Augmented Generation (RAG) framework to enhance information extraction from PDF files. Recognizing the richness and diversity of data within PDFs--including text, images, vector diagrams, graphs, and tables--poses unique challenges for existing QA systems primarily designed for textual content. We seek to develop a comprehensive RAG-based QA system that will effectively address complex multimodal questions, where several data types are combined in the query. This is mainly achieved by refining approaches to processing and integrating non-textual elements in PDFs into the RAG framework to derive precise and relevant answers, as well as fine-tuning large language models to better adapt to our system. We provide an in-depth experimental evaluation of our solution, demonstrating its capability to extract…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
