Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data

Chandana Cheerla

arXiv:2507.12425·cs.CL·July 17, 2025

Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data

Chandana Cheerla

PDF

Open Access

TL;DR

This paper introduces an advanced retrieval-augmented generation framework tailored for structured enterprise data, significantly improving accuracy and relevance in enterprise question-answering tasks through hybrid retrieval, semantic chunking, and metadata-aware filtering.

Contribution

It presents a novel RAG framework combining dense and sparse retrieval with metadata filtering, semantic chunking, and tabular data preservation, optimized for enterprise data applications.

Findings

01

Precision@5 increased by 15%

02

Recall@5 increased by 13%

03

Higher faithfulness and relevance scores

Abstract

Organizations increasingly rely on proprietary enterprise data, including HR records, structured reports, and tabular documents, for critical decision-making. While Large Language Models (LLMs) have strong generative capabilities, they are limited by static pretraining, short context windows, and challenges in processing heterogeneous data formats. Conventional Retrieval-Augmented Generation (RAG) frameworks address some of these gaps but often struggle with structured and semi-structured data. This work proposes an advanced RAG framework that combines hybrid retrieval strategies using dense embeddings (all-mpnet-base-v2) and BM25, enhanced by metadata-aware filtering with SpaCy NER and cross-encoder reranking. The framework applies semantic chunking to maintain textual coherence and retains tabular data structures to preserve row-column integrity. Quantized indexing optimizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies