QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Weizhou Shen; Ziyi Yang; Chenliang Li; Zhiyuan Lu; Miao Peng; Huashan Sun; Yingcheng Shi; Shengyi Liao; Shaopeng Lai; Bo Zhang; Dayiheng Liu; Fei Huang; Jingren Zhou; Ming Yan

arXiv:2512.12967·cs.CL·December 16, 2025

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Weizhou Shen, Ziyi Yang, Chenliang Li, Zhiyuan Lu, Miao Peng, Huashan Sun, Yingcheng Shi, Shengyi Liao, Shaopeng Lai, Bo Zhang, Dayiheng Liu, Fei Huang, Jingren Zhou, Ming Yan

PDF

Open Access 5 Models

TL;DR

QwenLong-L1.5 introduces innovative data synthesis, stabilized reinforcement learning, and memory-augmented architecture to significantly enhance long-context reasoning and memory management in large language models.

Contribution

It presents a comprehensive post-training approach combining data generation, reinforcement learning stability techniques, and memory management for ultra-long contexts, advancing long-range reasoning capabilities.

Findings

01

Achieves performance comparable to GPT-5 and Gemini-2.5-Pro on long-context benchmarks.

02

Surpasses baseline by 9.90 points on long-context reasoning tasks.

03

Yields a 9.48-point gain on ultra-long tasks exceeding 4 million tokens.

Abstract

We introduce QwenLong-L1.5, a model that achieves superior long-context reasoning capabilities through systematic post-training innovations. The key technical breakthroughs of QwenLong-L1.5 are as follows: (1) Long-Context Data Synthesis Pipeline: We develop a systematic synthesis framework that generates challenging reasoning tasks requiring multi-hop grounding over globally distributed evidence. By deconstructing documents into atomic facts and their underlying relationships, and then programmatically composing verifiable reasoning questions, our approach creates high-quality training data at scale, moving substantially beyond simple retrieval tasks to enable genuine long-range reasoning capabilities. (2) Stabilized Reinforcement Learning for Long-Context Training: To overcome the critical instability in long-context RL, we introduce task-balanced sampling with task-specific advantage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics