Analise Semantica Automatizada com LLM e RAG para Bulas Farmaceuticas

Daniel Meireles do Rego

arXiv:2507.21103·cs.IR·July 30, 2025

Analise Semantica Automatizada com LLM e RAG para Bulas Farmaceuticas

Daniel Meireles do Rego

PDF

TL;DR

This paper explores combining RAG architectures with Large Language Models to automate the semantic analysis of pharmaceutical leaflets, improving information retrieval and interpretation of unstructured PDF documents.

Contribution

It introduces a novel approach integrating vector search, semantic data extraction, and natural language generation for pharmaceutical document analysis using RAG and LLMs.

Findings

01

Significant improvements in accuracy and completeness of information retrieval.

02

Faster response times in semantic queries.

03

Enhanced consistency in interpreting technical texts.

Abstract

The production of digital documents has been growing rapidly in academic, business, and health environments, presenting new challenges in the efficient extraction and analysis of unstructured information. This work investigates the use of RAG (Retrieval-Augmented Generation) architectures combined with Large-Scale Language Models (LLMs) to automate the analysis of documents in PDF format. The proposal integrates vector search techniques by embeddings, semantic data extraction and generation of contextualized natural language responses. To validate the approach, we conducted experiments with drug package inserts extracted from official public sources. The semantic queries applied were evaluated by metrics such as accuracy, completeness, response speed and consistency. The results indicate that the combination of RAG with LLMs offers significant gains in intelligent information retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.