QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Fanqi Wan; Weizhou Shen; Shengyi Liao; Yingcheng Shi; Chenliang Li; Ziyi Yang; Ji Zhang; Fei Huang; Jingren Zhou; Ming Yan

arXiv:2505.17667·cs.CL·May 28, 2025

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan

PDF

1 Repo 4 Models

TL;DR

QwenLong-L1 introduces a reinforcement learning framework that effectively extends large reasoning models to handle long-context inputs, improving performance on document question-answering tasks through progressive training and curriculum strategies.

Contribution

The paper formalizes long-context reasoning RL and proposes QwenLong-L1, a novel framework that enhances short-context LRMs for long-context reasoning via progressive scaling and curriculum-guided RL.

Findings

01

Outperforms existing LRMs on seven long-context benchmarks.

02

Achieves performance comparable to state-of-the-art models like Claude-3.7.

03

Demonstrates robust reasoning capabilities in information-intensive environments.

Abstract

Recent large reasoning models (LRMs) have demonstrated strong reasoning capabilities through reinforcement learning (RL). These improvements have primarily been observed within the short-context reasoning tasks. In contrast, extending LRMs to effectively process and reason on long-context inputs via RL remains a critical unsolved challenge. To bridge this gap, we first formalize the paradigm of long-context reasoning RL, and identify key challenges in suboptimal training efficiency and unstable optimization process. To address these issues, we propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling. Specifically, we utilize a warm-up supervised fine-tuning (SFT) stage to establish a robust initial policy, followed by a curriculum-guided phased RL technique to stabilize the policy evolution, and enhanced with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tongyi-zhiwen/qwenlong-l1
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.