From RAG to RICHES: Retrieval Interlaced with Sequence Generation
Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

TL;DR
RICHES introduces a unified retrieval and sequence generation method that enables LLMs to retrieve and generate content in a single decoding pass, improving flexibility and performance in open-domain question answering tasks.
Contribution
It proposes a retrieval-interlaced sequence generation approach that eliminates the need for separate retriever and generator modules, adaptable to various tasks via prompting.
Findings
Strong performance on ODQA tasks
Supports multi-hop retrievals and attributed evidence
Operates without additional training
Abstract
We present RICHES, a novel approach that interleaves retrieval with sequence generation tasks. RICHES offers an alternative to conventional RAG systems by eliminating the need for separate retriever and generator. It retrieves documents by directly decoding their contents, constrained on the corpus. Unifying retrieval with generation allows us to adapt to diverse new tasks via prompting alone. RICHES can work with any Instruction-tuned model, without additional training. It provides attributed evidence, supports multi-hop retrievals and interleaves thoughts to plan on what to retrieve next, all within a single decoding pass of the LLM. We demonstrate the strong performance of RICHES across ODQA tasks including attributed and multi-hop QA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · Residual Connection · WordPiece · Softmax · Byte Pair Encoding · Layer Normalization
