Question-Based Retrieval using Atomic Units for Enterprise RAG
Vatsal Raina, Mark Gales

TL;DR
This paper introduces a zero-shot dense retrieval method using atomic statements and synthetic questions to improve chunk recall in enterprise retrieval augmented generation, enhancing overall LLM response accuracy.
Contribution
It proposes a novel zero-shot retrieval approach that decomposes documents into atomic units and generates synthetic questions to improve retrieval recall in RAG systems.
Findings
Higher recall achieved with atomic units compared to chunks
Synthetic questions further improve retrieval accuracy
Enhanced retrieval leads to better LLM response quality
Abstract
Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Dense Connections · Attention Dropout · Linear Layer · Weight Decay · Residual Connection · Byte Pair Encoding · Adam · Dropout
