L-RAG: Balancing Context and Retrieval with Entropy-Based Lazy Loading
Sergii Voloshyn

TL;DR
L-RAG introduces an adaptive, entropy-based lazy retrieval framework for RAG systems, significantly reducing computational costs while maintaining high accuracy by selectively triggering document retrieval based on model uncertainty.
Contribution
The paper presents L-RAG, a novel hierarchical retrieval method that adaptively manages context using entropy gating, improving efficiency without sacrificing accuracy.
Findings
L-RAG achieves up to 26% retrieval reduction with minimal accuracy loss.
Entropy effectively signals model uncertainty, correlating with prediction correctness.
L-RAG reduces latency by 80-210ms per query in high-latency retrieval scenarios.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as the predominant paradigm for grounding Large Language Model outputs in factual knowledge, effectively mitigating hallucinations. However, conventional RAG systems operate under a "retrieve-always" assumption, querying vector databases for every input regardless of query complexity. This static approach incurs substantial computational overhead and inference latency, particularly problematic for high-throughput production deployments. We introduce L-RAG (Lazy Retrieval-Augmented Generation), an adaptive framework that implements hierarchical context management through entropy-based gating. L-RAG employs a two-tier architecture: queries are first processed with a compact document summary, and expensive chunk retrieval is triggered only when the model's predictive entropy exceeds a calibrated threshold, signaling genuine uncertainty.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior
