Advancing Retrieval-Augmented Generation for Structured Enterprise and Internal Data
Chandana Cheerla

TL;DR
This paper introduces an advanced retrieval-augmented generation framework tailored for structured enterprise data, significantly improving accuracy and relevance in enterprise question-answering tasks through hybrid retrieval, semantic chunking, and metadata-aware filtering.
Contribution
It presents a novel RAG framework combining dense and sparse retrieval with metadata filtering, semantic chunking, and tabular data preservation, optimized for enterprise data applications.
Findings
Precision@5 increased by 15%
Recall@5 increased by 13%
Higher faithfulness and relevance scores
Abstract
Organizations increasingly rely on proprietary enterprise data, including HR records, structured reports, and tabular documents, for critical decision-making. While Large Language Models (LLMs) have strong generative capabilities, they are limited by static pretraining, short context windows, and challenges in processing heterogeneous data formats. Conventional Retrieval-Augmented Generation (RAG) frameworks address some of these gaps but often struggle with structured and semi-structured data. This work proposes an advanced RAG framework that combines hybrid retrieval strategies using dense embeddings (all-mpnet-base-v2) and BM25, enhanced by metadata-aware filtering with SpaCy NER and cross-encoder reranking. The framework applies semantic chunking to maintain textual coherence and retains tabular data structures to preserve row-column integrity. Quantized indexing optimizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
