OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

Xinyu Ma; Mingzhou Xu; Xuebo Liu; Chang Jin; Qiang Wang; Derek F. Wong; Min Zhang

arXiv:2604.18530·cs.AI·April 21, 2026

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang

PDF

1 Repo

TL;DR

OGER introduces a unified offline-guided exploration reward for reinforcement learning that enhances reasoning capabilities and out-of-domain generalization in large language models.

Contribution

It presents a novel framework combining offline guidance and online RL with entropy-aware rewards, improving reasoning and exploration.

Findings

01

OGER outperforms baselines on reasoning benchmarks.

02

Achieves significant gains in mathematical reasoning tasks.

03

Maintains robust generalization to out-of-domain tasks.

Abstract

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial latent space. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integration or are constrained by the model's inherent capacity. In this paper, we propose OGER, a novel framework that unifies offline teacher guidance and online reinforcement learning through a specialized reward modeling lens. OGER employs multi-teacher collaborative training and constructs an auxiliary exploration reward that leverages both offline trajectories and the model's own entropy to incentivize autonomous exploration. Extensive experiments across mathematical and general reasoning benchmarks demonstrate that OGER significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecoli-hit/OGER.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.