SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems
Duy A. Nguyen, Hai H. Do, Minh Doan, Minh N. Do

TL;DR
SPAR is a lightweight, adaptive retrieval framework that enhances access to legacy enterprise data using LLMs, reducing computational costs and improving retrieval relevance.
Contribution
It introduces a novel two-stage retrieval process combining semantic metadata indexing with dynamic vector databases tailored for legacy systems.
Findings
Improved retrieval effectiveness in enterprise-scale file systems
Reduced computational overhead compared to traditional RAG pipelines
Enhanced relevance and controllability in data retrieval
Abstract
The ability to extract value from historical data is essential for enterprise decision-making. However, much of this information remains inaccessible within large legacy file systems that lack structured organization and semantic indexing, making retrieval and analysis inefficient and error-prone. We introduce SPAR (Session-based Pipeline for Adaptive Retrieval), a conceptual framework that integrates Large Language Models (LLMs) into a Retrieval-Augmented Generation (RAG) architecture specifically designed for legacy enterprise environments. Unlike conventional RAG pipelines, which require costly construction and maintenance of full-scale vector databases that mirror the entire file system, SPAR employs a lightweight two-stage process: a semantic Metadata Index is first created, after which session-specific vector databases are dynamically generated on demand. This design reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Biomedical Text Mining and Ontologies · Electronic Health Records Systems
