A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Florin Cuconasu; Giovanni Trappolini; Nicola Tonellotto; Fabrizio; Silvestri

arXiv:2406.14972·cs.CL·June 24, 2024

A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Florin Cuconasu, Giovanni Trappolini, Nicola Tonellotto, Fabrizio, Silvestri

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper reveals that base large language models outperform instructed models in Retrieval Augmented Generation tasks by an average of 20%, challenging common assumptions about the superiority of instructed LLMs in RAG systems.

Contribution

It provides empirical evidence that base models can outperform instructed models in RAG, questioning prevailing beliefs and prompting reconsideration of model choice in such systems.

Findings

01

Base models outperform instructed models by 20% in RAG tasks.

02

Challenging the assumption that instructed LLMs are superior in RAG.

03

Highlights the need for broader discussion on RAG model selection.

Abstract

Retrieval Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by large language models (LLMs). The current common practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. Contrary to popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more nuanced situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

florin-git/Base-vs-Instruct-LLMs-in-RAG-Systems
pytorchOfficial

Datasets

florin-hf/wiki_dump2018_no_duplicates
dataset· 210 dl
210 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · WordPiece · Softmax · Layer Normalization · Linear Warmup With Linear Decay · Byte Pair Encoding · Attention Dropout · Dropout