GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

Minxuan Lv; Tiehua Mei; Tanlong Du; Junmin Chen; Zhenpeng Su; Ziyang Chen; Ziqi Wang; Zhennan Wu; Ruotong Pan; jian Liang; Ruiming Tang; Han Li

arXiv:2605.19577·cs.CL·May 20, 2026

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

Minxuan Lv, Tiehua Mei, Tanlong Du, Junmin Chen, Zhenpeng Su, Ziyang Chen, Ziqi Wang, Zhennan Wu, Ruotong Pan, jian Liang, Ruiming Tang, Han Li

PDF

1 Repo 2 Models 1 Datasets

TL;DR

GoLongRL introduces a capability-oriented dataset and a novel reweighting method to enhance long-context reinforcement learning, demonstrating improved performance and broader task coverage.

Contribution

The paper provides an open dataset for long-context RL, a new training pipeline, and TMN-Reweight for better multitask optimization, advancing practical long-context RL capabilities.

Findings

01

Dataset outperforms closed-source counterparts.

02

Model trained on the dataset achieves competitive long-context performance.

03

TMN-Reweight improves optimization and overall performance.

Abstract

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaoxuannlp/GoLongRL
github

Models

Datasets

Kwai-Klear/GoLongRL
dataset· 825 dl
825 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.