GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
Minxuan Lv, Tiehua Mei, Tanlong Du, Junmin Chen, Zhenpeng Su, Ziyang Chen, Ziqi Wang, Zhennan Wu, Ruotong Pan, jian Liang, Ruiming Tang, Han Li

TL;DR
GoLongRL introduces a capability-oriented dataset and a novel reweighting method to enhance long-context reinforcement learning, demonstrating improved performance and broader task coverage.
Contribution
The paper provides an open dataset for long-context RL, a new training pipeline, and TMN-Reweight for better multitask optimization, advancing practical long-context RL capabilities.
Findings
Dataset outperforms closed-source counterparts.
Model trained on the dataset achieves competitive long-context performance.
TMN-Reweight improves optimization and overall performance.
Abstract
We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
