When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning
Wei Wen, Sihang Deng, Tianjun Wei, Keyu Chen, Ruizhi Qiao, Xing Sun

TL;DR
This paper introduces ACQO, a reinforcement learning framework that adaptively optimizes complex query search strategies in retrieval systems, improving accuracy and efficiency in handling multi-faceted user queries.
Contribution
The paper presents a novel RL-based framework with adaptive query reformulation and robust result fusion, addressing training instability and expanding search capabilities for complex queries.
Findings
Achieves state-of-the-art results on complex query benchmarks.
Improves computational efficiency over existing methods.
Demonstrates broad compatibility with retrieval architectures.
Abstract
Query optimization is a crucial component for the efficacy of Retrieval-Augmented Generation (RAG) systems. While reinforcement learning (RL)-based agentic and reasoning methods have recently emerged as a promising direction on query optimization, most existing approaches focus on the expansion and abstraction of a single query. However, complex user queries are prevalent in real-world scenarios, often requiring multiple parallel and sequential search strategies to handle disambiguation and decomposition. Directly applying RL to these complex cases introduces significant hurdles. Determining the optimal number of sub-queries and effectively re-ranking and merging retrieved documents vastly expands the search space and complicates reward design, frequently leading to training instability. To address these challenges, we propose a novel RL framework called Adaptive Complex Query…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper's strength is the discovery and clear demonstration of retriever-specific policy learning. The case study in Appendix B.1 (Figure 5) provides definitive evidence. It qualitatively shows the agent learning to "keyword stuff" for the sparse BM25 retriever. In contrast, for the dense ANCE retriever, it learns to perform semantic decomposition into distinct sub-queries. This insight—that the agent learns what the retriever considers a "good" query—is a fundamental contribution. 2. Anoth
1. A critical reviewer could argue that the individual components lack fundamental novelty. Curriculum Learning is a well-established field, rank fusion methods (the paper itself cites RRF ) are common, and RL for query reformulation has been explored. The paper's primary weakness, from this perspective, is that its contribution could be framed as "superior systems engineering" or a novel integration rather than a singular algorithmic breakthrough. While the emergent adaptation is novel, the bui
The overall method seems sound, incorporating scores when joining lists of documents is a valid approach and using curriculum learning can sometimes improve performance, especially for hard to optimize tasks like RL. Good results Overall well written paper
The experiment setup does not explain if hyperparameters were selected using a dev set. While there is no fundamental issue with the proposed method, it is in essence a combination of tricks. From the presented experiments it is not clear to me whether this curriculum method is robust enough for general use.
1. ACQO achieves state-of-the-art results on multiple benchmarks (HotpotQA, MultiHop-RAG), outperforming baselines even with a small 3B-parameter model. 2. The two-stage Curriculum Reinforcement Learning (CRL) strategy—starting with broad exploration and then focusing on hard examples—mitigates reward sparsity and training instability common in RL-based query optimization. 3. The Rank-Score Fusion (RSF) module effectively combines results from multiple sub-queries by leveraging both rank positio
1. The experiments focus primarily on retrieval metrics (e.g., Recall@k, MRR), but do not include downstream end-to-end question answering accuracy or generation quality, leaving open whether retrieval gains consistently translate to better final answers. 2. The reward signal assumes access to ground-truth relevant documents during training, which may not hold in fully unsupervised or open-domain settings—limiting true “zero-supervision” applicability. 3. Although the paper claims efficiency, ge
1. The paper introduces a novel perspective on curriculum reinforcement learning where different stages focus on different objectives, instead of only focusing on data difficulty. 2. The Rank-Score Fusion module provides an elegant and model-agnostic solution for aggregating multiple retrieval results, which works seamlessly with both sparse and dense retrievers without requiring retriever-specific modifications. 3. The adaptive query decomposition mechanism allows the model to autonomously de
1. The experimental evaluation is limited in scope. Specifically, desipe the main results are evaluated on two tasks (disambiguation and multi-hop task), each task only contains just one dataset. Including additional benchmarks for each task type would strengthen the empirical validation of the proposed method. 2. The performance improvements of ACQO shown in Table 2 and 3 are marginal and the comparisons raise several concerns: * As shown in Table 2, the performance of ACQO is even lower th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Data Management and Algorithms · Web Data Mining and Analysis
