Context-Picker: Dynamic context selection using multi-stage reinforcement learning
Siyuan Zhu, Chengdong Xu, Kaiqiang Ke, Chao Yu

TL;DR
Context-Picker introduces a multi-stage reinforcement learning approach to dynamically select minimal yet sufficient context for long-context question answering, improving accuracy by reducing noise and redundancy.
Contribution
It proposes a novel two-stage RL framework with an offline evidence distillation pipeline for effective context selection in long-context QA.
Findings
Outperforms strong RAG baselines on five QA datasets.
Achieves higher answer accuracy with minimal context.
Demonstrates the effectiveness of coarse-to-fine optimization and reward shaping.
Abstract
In long-context question answering, selecting the appropriate scope of context for a query remains a key and unresolved challenge. Insufficient context can lead to missing essential information, whereas excessive context often introduces noise and degrades answer quality. Conventional methods, such as retrieving a fixed number of passages or applying reranking, struggle to dynamically determine which context to include. This is especially problematic for factoid questions, which typically depend only on a few precise pieces of evidence. To overcome this limitation, we propose Context-Picker, a reasoning-aware framework that reframes context selection as the task of identifying a minimal sufficient evidence subset, moving beyond conventional similarity-based ranking. Context-Picker uses a human-inspired two-stage reinforcement learning schedule: stage 1 focuses on improving the recall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Expert finding and Q&A systems
