# Can Compact Language Models Search Like Agents? Distillation-Guided Policy Optimization for Preserving Agentic RAG Capabilities

**Authors:** Rikuto Kotoge, Mai Nishimura, Jiaxin Ma

arXiv: 2508.20324 · 2026-04-28

## TL;DR

This paper introduces DGPO, a method to enable small language models to perform agentic search and planning behaviors effectively, overcoming training challenges through teacher guidance.

## Contribution

It presents DGPO, a novel distillation-guided policy optimization technique that enhances agentic capabilities in compact language models, with a new metric ARC for analysis.

## Key findings

- DGPO enables small models to perform sophisticated agentic search behaviors.
- Compact models with DGPO can outperform larger teachers in some agentic tasks.
- The approach makes agentic RAG feasible in resource-constrained settings.

## Abstract

Reinforcement Learning has emerged as a dominant post-training approach to elicit agentic RAG behaviors such as search and planning from language models. Despite its success with larger models, applying RL to compact models (e.g., 0.5--1B parameters) presents unique challenges. The compact models exhibit poor initial performance, resulting in sparse rewards and unstable training. To overcome these difficulties, we propose Distillation-Guided Policy Optimization (DGPO), which employs cold-start initialization from teacher demonstrations and continuous teacher guidance during policy optimization. To understand how compact models preserve agentic behavior, we introduce Agentic RAG Capabilities (ARC), a fine-grained metric analyzing reasoning, search coordination, and response synthesis. Comprehensive experiments demonstrate that DGPO enables compact models to achieve sophisticated agentic search behaviors, even outperforming the larger teacher model in some cases. DGPO makes agentic RAG feasible in computing resource-constrained environments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20324/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20324/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/2508.20324/full.md

---
Source: https://tomesphere.com/paper/2508.20324