SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems

Duy A. Nguyen; Hai H. Do; Minh Doan; Minh N. Do

arXiv:2512.12938·cs.IR·December 16, 2025

SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems

Duy A. Nguyen, Hai H. Do, Minh Doan, Minh N. Do

PDF

Open Access

TL;DR

SPAR is a lightweight, adaptive retrieval framework that enhances access to legacy enterprise data using LLMs, reducing computational costs and improving retrieval relevance.

Contribution

It introduces a novel two-stage retrieval process combining semantic metadata indexing with dynamic vector databases tailored for legacy systems.

Findings

01

Improved retrieval effectiveness in enterprise-scale file systems

02

Reduced computational overhead compared to traditional RAG pipelines

03

Enhanced relevance and controllability in data retrieval

Abstract

The ability to extract value from historical data is essential for enterprise decision-making. However, much of this information remains inaccessible within large legacy file systems that lack structured organization and semantic indexing, making retrieval and analysis inefficient and error-prone. We introduce SPAR (Session-based Pipeline for Adaptive Retrieval), a conceptual framework that integrates Large Language Models (LLMs) into a Retrieval-Augmented Generation (RAG) architecture specifically designed for legacy enterprise environments. Unlike conventional RAG pipelines, which require costly construction and maintenance of full-scale vector databases that mirror the entire file system, SPAR employs a lightweight two-stage process: a semantic Metadata Index is first created, after which session-specific vector databases are dynamically generated on demand. This design reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Biomedical Text Mining and Ontologies · Electronic Health Records Systems