Let your LLM generate a few tokens and you will reduce the need for retrieval
Herv\'e D\'ejean

TL;DR
This paper introduces a method where large language models (LLMs) generate tokens to reduce retrieval steps in augmented generation, using an LLM-as-a-judge to compute an IK score that improves efficiency and dataset characterization.
Contribution
It presents the IK score and a novel approach of using LLMs as judges to reduce retrieval in augmented generation, requiring fewer training samples and improving dataset analysis.
Findings
Achieves 80% accuracy in retrieval-assisted generation.
Reduces search and reranking steps by over 50%.
Only 20,000 training samples needed for good performance.
Abstract
In this paper, we investigate how efficiently large language models (LLM) can be trained to check whether an answer is already stored in their parametric memory. We distill an LLM-as-a-judge to compute the IK (I Know) score. We found that this method is particularly beneficial in the context of retrieval-assisted augmented generation (RAG), with a respectable accuracy of 80%. It enables a significant reduction (more than 50%) in the number of search and reranking steps required for certain data sets. We have also introduced the IK score, which serves as a useful tool for characterising datasets by facilitating the classification task. Interestingly, through the inclusion of response tokens as input, our results suggest that only about 20,000 training samples are required to achieve good performance. The central element of this work is the use of a teacher model - the LLM as a judge - to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Natural Language Processing Techniques
