A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems
Pranav Pushkar Mishra, Kranti Prakash Yeole, Ramyashree Keshavamurthy, Mokshit Bharat Surana, Fatemeh Sarayloo

TL;DR
This paper introduces a systematic framework using LLM-generated metadata to improve document retrieval in enterprise RAG systems, demonstrating significant accuracy gains and low latency.
Contribution
It presents a structured pipeline for metadata enrichment, evaluating chunking and embedding strategies, and provides empirical evidence of improved retrieval performance.
Findings
Metadata enrichment outperforms content-only baselines.
Recursive chunking with TF-IDF embeddings achieves 82.5% precision.
Naive chunking with prefix-fusion yields NDCG 0.813.
Abstract
In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic empirical framework for metadata enrichment using large language models (LLMs) to enhance document retrieval in Retrieval-Augmented Generation (RAG) systems. Our approach employs a structured pipeline that dynamically generates meaningful metadata for document segments, substantially improving their semantic representations and retrieval accuracy. Through a controlled 3 X 3 experimental matrix, we compare three chunking strategies -- semantic, recursive, and naive -- and evaluate their interactions with three embedding techniques -- content-only, TF-IDF weighted, and prefix-fusion -- isolating the contribution of each component through ablation analysis. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
