Retrieval-augmented reasoning with lean language models

Ryan Sze-Yin Chan; Federico Nanni; Tomas Lazauskas; Rosie Wood; Penelope Yong; Lionel Tarassenko; Mark Girolami; James Geddes; Andrew Duncan

arXiv:2508.11386·cs.CL·August 18, 2025

Retrieval-augmented reasoning with lean language models

Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan

PDF

TL;DR

This paper presents a lightweight retrieval-augmented reasoning system that combines domain-specific fine-tuning with synthetic data and document compression, achieving near-frontier performance in resource-constrained environments.

Contribution

It introduces a novel lean language model architecture integrating retrieval and reasoning, optimized for privacy and deployment in limited-resource settings.

Findings

01

Significant improvement in answer accuracy over non-reasoning models

02

Approaches frontier-level performance with domain-specific fine-tuning

03

Effective use of synthetic data and document compression techniques

Abstract

This technical report details a novel approach to combining reasoning and retrieval augmented generation (RAG) within a single, lean language model architecture. While existing RAG systems typically rely on large-scale models and external APIs, our work addresses the increasing demand for performant and privacy-preserving solutions deployable in resource-constrained or secure environments. Building on recent developments in test-time scaling and small-scale reasoning models, we develop a retrieval augmented conversational agent capable of interpreting complex, domain-specific queries using a lightweight backbone model. Our system integrates a dense retriever with fine-tuned Qwen2.5-Instruct models, using synthetic query generation and reasoning traces derived from frontier models (e.g., DeepSeek-R1) over a curated corpus, in this case, the NHS A-to-Z condition pages. We explore the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.