Can Small Language Models Use What They Retrieve? An Empirical Study of Retrieval Utilization Across Model Scale

Sanchit Pandey (BITS Pilani; Hyderabad; India)

arXiv:2603.11513·cs.CL·March 13, 2026

Can Small Language Models Use What They Retrieve? An Empirical Study of Retrieval Utilization Across Model Scale

Sanchit Pandey (BITS Pilani, Hyderabad, India)

PDF

Open Access

TL;DR

This study investigates whether small language models under 7B parameters can effectively utilize retrieved information in retrieval-augmented generation, revealing significant utilization bottlenecks and potential negative impacts at this scale.

Contribution

It provides an empirical analysis of retrieval utilization across model sizes and introduces a knowledge split to isolate utilization failures from retrieval quality issues.

Findings

01

Models under 7B fail to extract correct answers 85-100% of the time on questions they can't answer alone.

02

Adding retrieval context often destroys previously known answers, indicating distraction effects.

03

The main failure mode is irrelevant generation, where models ignore provided context.

Abstract

Retrieval augmented generation RAG is widely deployed to improve factual accuracy in language models yet it remains unclear whether smaller models of size 7B parameters or less can effectively utilize retrieved information. To investigate this question we evaluate five model sizes from 360M to 8B across three architecture families SmolLM2 Qwen2.5 and Llama 3.1 under four retrieval conditions including no retrieval BM25 dense retrieval using E5 large v2 and oracle retrieval where the retrieved passage is guaranteed to contain the answer. We introduce a parametric knowledge split that separates questions a model can already answer from those that require external knowledge which allows us to isolate utilization failure from retrieval quality failure. We find three main results. First even with oracle retrieval models of size 7B or smaller fail to extract the correct answer 85 to 100…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior