Conversational Text Extraction with Large Language Models Using Retrieval-Augmented Systems
Soham Roy, Mitul Goswami, Nisharg Nargund, Suneeta Mohanty, Prasant, Kumar Pattnaik

TL;DR
This paper presents a retrieval-augmented system utilizing large language models to extract and summarize text from PDFs via a conversational interface, improving user interaction and information retrieval.
Contribution
The study introduces a novel retrieval-augmented approach combining LLMs and vector stores for effective PDF text extraction and summarization in a conversational setting.
Findings
Achieves competitive ROUGE scores for text extraction and summarization
Enables efficient, context-aware question answering from PDFs
Provides an intuitive interface for knowledge extraction from documents
Abstract
This study introduces a system leveraging Large Language Models (LLMs) to extract text and enhance user interaction with PDF documents via a conversational interface. Utilizing Retrieval-Augmented Generation (RAG), the system provides informative responses to user inquiries while highlighting relevant passages within the PDF. Upon user upload, the system processes the PDF, employing sentence embeddings to create a document-specific vector store. This vector store enables efficient retrieval of pertinent sections in response to user queries. The LLM then engages in a conversational exchange, using the retrieved information to extract text and generate comprehensive, contextually aware answers. While our approach demonstrates competitive ROUGE values compared to existing state-of-the-art techniques for text extraction and summarization, we acknowledge that further qualitative evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsAttentive Walk-Aggregating Graph Neural Network
