Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Teng Xiao; Yige Yuan; Hamish Ivison; Huaisheng Zhu; Faeze Brahman; Nathan Lambert; Pradeep Dasigi; Noah A. Smith; and Hannaneh Hajishirzi

arXiv:2603.11327·cs.LG·March 19, 2026

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Teng Xiao, Yige Yuan, Hamish Ivison, Huaisheng Zhu, Faeze Brahman, Nathan Lambert, Pradeep Dasigi, Noah A. Smith, and Hannaneh Hajishirzi

PDF

Open Access

TL;DR

This paper presents MR-Search, a meta-reinforcement learning approach that uses self-reflection to improve agent exploration across episodes, leading to better performance on multiple benchmarks.

Contribution

Introduces MR-Search, a novel meta-RL framework that incorporates self-reflection and cross-episode learning for enhanced agent exploration and adaptation.

Findings

01

Achieves 9.2% to 19.3% improvements over baselines

02

Demonstrates strong generalization across benchmarks

03

Utilizes a multi-turn RL algorithm for fine-grained credit assignment

Abstract

This paper introduces MR-Search, an in-context meta reinforcement learning (RL) formulation for agentic search with self-reflection. Instead of optimizing a policy within a single independent episode with sparse rewards, MR-Search trains a policy that conditions on past episodes and adapts its search strategy across episodes. MR-Search learns to learn a search strategy with self-reflection, allowing search agents to improve in-context exploration at test-time. Specifically, MR-Search performs cross-episode exploration by generating explicit self-reflections after each episode and leveraging them as additional context to guide subsequent attempts, thereby promoting more effective exploration during test-time. We further introduce a multi-turn RL algorithm that estimates a dense relative advantage at the turn level, enabling fine-grained credit assignment on each episode. Empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Multimodal Machine Learning Applications