Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning
Hanbin Wang, Jingwei Song, Jinpeng Li, Fei Mi, Lifeng Shang

TL;DR
This paper introduces GPSO, a reinforcement learning framework that enables large reasoning models to select the most effective reasoning pattern for each problem, significantly improving their accuracy and robustness.
Contribution
The paper proposes GPSO, a novel method that allows models to dynamically choose optimal reasoning strategies, addressing biases and sub-optimal patterns in existing training methods.
Findings
GPSO improves performance across multiple benchmarks.
Models with GPSO show reduced pattern bias and increased adaptability.
Significant accuracy gains observed with various model sizes.
Abstract
Large reasoning models (LRMs) exhibit diverse high-level reasoning patterns (e.g., direct solution, reflection-and-verification, and exploring multiple solutions), yet prevailing training recipes implicitly bias models toward a limited set of dominant patterns. Through a systematic analysis, we identify substantial accuracy variance across these patterns on mathematics and science benchmarks, revealing that a model's default reasoning pattern is often sub-optimal for a given problem. To address this, we introduce Group Pattern Selection Optimization (GPSO), a reinforcement learning framework that extends GRPO by incorporating multi-pattern rollouts, verifier-guided optimal pattern selection per problem, and attention masking during optimization to prevent the leakage of explicit pattern suffixes into the learned policy. By exploring a portfolio of diverse reasoning strategies and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Advanced Graph Neural Networks · Topic Modeling
