Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs
Joon Park, Kyohei Atarashi, Koh Takeuchi, and Hisashi Kashima

TL;DR
This paper introduces a prompt engineering method that emulates retrieval-augmented generation within LLMs to improve understanding of long contexts, reducing reliance on external retrieval systems.
Contribution
It presents a novel prompt-based approach that treats the model as both retriever and reasoner, enhancing multi-hop reasoning in long texts without external retrieval.
Findings
Improved accuracy on multi-fact questions in long contexts
Prompt structure significantly impacts performance
Single-pass method reduces need for external retrievers
Abstract
This paper addresses the challenge of comprehending very long contexts in Large Language Models (LLMs) by proposing a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought (CoT) reasoning. While recent LLMs support over 100,000 tokens in a single prompt, simply enlarging context windows has not guaranteed robust multi-hop reasoning when key details are scattered across massive input. Our approach treats the model as both the retriever and the reasoner: it first tags relevant segments within a long passage, then employs a stepwise CoT workflow to integrate these pieces of evidence. This single-pass method thereby reduces reliance on an external retriever, yet maintains focus on crucial segments. We evaluate our approach on selected tasks from BABILong, which interleaves standard bAbI QA problems with large amounts of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Recommender Systems and Techniques
