# AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

**Authors:** Lang Mei, Zhihan Yang, Xiaohan Yu, Huanyao Zhang, Chong Chen

arXiv: 2508.20368 · 2025-12-30

## TL;DR

AI-SearchPlanner is a reinforcement learning framework that improves search planning for AI systems by decoupling components, using dual rewards, and Pareto optimization, leading to better effectiveness and efficiency.

## Contribution

The paper introduces a novel RL framework that decouples search planning from question answering, employing Pareto optimization to enhance performance and generalization.

## Key findings

- Outperforms existing RL-based search agents in effectiveness.
- Achieves higher efficiency in search planning tasks.
- Demonstrates strong generalization across models and domains.

## Abstract

Recent studies have explored integrating Large Language Models (LLMs) with search engines to leverage both the LLMs' internal pre-trained knowledge and external information. Specially, reinforcement learning (RL) has emerged as a promising paradigm for enhancing LLM reasoning through multi-turn interactions with search engines. However, existing RL-based search agents rely on a single LLM to handle both search planning and question-answering (QA) tasks in an end-to-end manner, which limits their ability to optimize both capabilities simultaneously. In practice, sophisticated AI search systems often employ a large, frozen LLM (e.g., GPT-4, DeepSeek-R1) to ensure high-quality QA. Thus, a more effective and efficient approach is to utilize a small, trainable LLM dedicated to search planning. In this paper, we propose \textbf{AI-SearchPlanner}, a novel reinforcement learning framework designed to enhance the performance of frozen QA models by focusing on search planning. Specifically, our approach introduces three key innovations: 1) Decoupling the Architecture of the Search Planner and Generator, 2) Dual-Reward Alignment for Search Planning, and 3) Pareto Optimization of Planning Utility and Cost, to achieve the objectives. Extensive experiments on real-world datasets demonstrate that AI SearchPlanner outperforms existing RL-based search agents in both effectiveness and efficiency, while exhibiting strong generalization capabilities across diverse frozen QA models and data domains.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20368/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20368/full.md

---
Source: https://tomesphere.com/paper/2508.20368