A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems

Pranav Pushkar Mishra; Kranti Prakash Yeole; Ramyashree Keshavamurthy; Mokshit Bharat Surana; Fatemeh Sarayloo

arXiv:2512.05411·cs.IR·April 1, 2026

A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems

Pranav Pushkar Mishra, Kranti Prakash Yeole, Ramyashree Keshavamurthy, Mokshit Bharat Surana, Fatemeh Sarayloo

PDF

TL;DR

This paper introduces a systematic framework using LLM-generated metadata to improve document retrieval in enterprise RAG systems, demonstrating significant accuracy gains and low latency.

Contribution

It presents a structured pipeline for metadata enrichment, evaluating chunking and embedding strategies, and provides empirical evidence of improved retrieval performance.

Findings

01

Metadata enrichment outperforms content-only baselines.

02

Recursive chunking with TF-IDF embeddings achieves 82.5% precision.

03

Naive chunking with prefix-fusion yields NDCG 0.813.

Abstract

In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic empirical framework for metadata enrichment using large language models (LLMs) to enhance document retrieval in Retrieval-Augmented Generation (RAG) systems. Our approach employs a structured pipeline that dynamically generates meaningful metadata for document segments, substantially improving their semantic representations and retrieval accuracy. Through a controlled 3 X 3 experimental matrix, we compare three chunking strategies -- semantic, recursive, and naive -- and evaluate their interactions with three embedding techniques -- content-only, TF-IDF weighted, and prefix-fusion -- isolating the contribution of each component through ablation analysis. The results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.