Evaluating Prompt Engineering Techniques for RAG in Small Language Models: A Multi-Hop QA Approach

Amir Hossein Mohammadi; Ali Moeinian; Zahra Razavizade; Afsaneh Fatemi; Reza Ramezani

arXiv:2602.13890·cs.CL·February 17, 2026

Evaluating Prompt Engineering Techniques for RAG in Small Language Models: A Multi-Hop QA Approach

Amir Hossein Mohammadi, Ali Moeinian, Zahra Razavizade, Afsaneh Fatemi, Reza Ramezani

PDF

Open Access

TL;DR

This study empirically evaluates 24 prompt templates for Retrieval Augmented Generation in small language models on multi-hop QA, revealing significant performance improvements and providing practical prompt design insights.

Contribution

It systematically investigates prompt template design for RAG in small models, introducing novel hybrid prompts and analyzing their impact on multi-hop question answering.

Findings

01

Performance gains up to 83% on Qwen2.5

02

Performance gains up to 84.5% on Gemma3-4B-It

03

Up to 6% improvement over standard RAG prompts

Abstract

Retrieval Augmented Generation (RAG) is a powerful approach for enhancing the factual grounding of language models by integrating external knowledge. While widely studied for large language models, the optimization of RAG for Small Language Models (SLMs) remains a critical research gap, particularly in complex, multi-hop question-answering tasks that require sophisticated reasoning. In these systems, prompt template design is a crucial yet under-explored factor influencing performance. This paper presents a large-scale empirical study to investigate this factor, evaluating 24 different prompt templates on the HotpotQA dataset. The set includes a standard RAG prompt, nine well-formed techniques from the literature, and 14 novel hybrid variants, all tested on two prominent SLMs: Qwen2.5-3B Instruct and Gemma3-4B-It. Our findings, based on a test set of 18720 instances, reveal significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques