QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Weizhou Shen, Ziyi Yang, Chenliang Li, Zhiyuan Lu, Miao Peng, Huashan Sun, Yingcheng Shi, Shengyi Liao, Shaopeng Lai, Bo Zhang, Dayiheng Liu, Fei Huang, Jingren Zhou, Ming Yan

TL;DR
QwenLong-L1.5 introduces innovative data synthesis, stabilized reinforcement learning, and memory-augmented architecture to significantly enhance long-context reasoning and memory management in large language models.
Contribution
It presents a comprehensive post-training approach combining data generation, reinforcement learning stability techniques, and memory management for ultra-long contexts, advancing long-range reasoning capabilities.
Findings
Achieves performance comparable to GPT-5 and Gemini-2.5-Pro on long-context benchmarks.
Surpasses baseline by 9.90 points on long-context reasoning tasks.
Yields a 9.48-point gain on ultra-long tasks exceeding 4 million tokens.
Abstract
We introduce QwenLong-L1.5, a model that achieves superior long-context reasoning capabilities through systematic post-training innovations. The key technical breakthroughs of QwenLong-L1.5 are as follows: (1) Long-Context Data Synthesis Pipeline: We develop a systematic synthesis framework that generates challenging reasoning tasks requiring multi-hop grounding over globally distributed evidence. By deconstructing documents into atomic facts and their underlying relationships, and then programmatically composing verifiable reasoning questions, our approach creates high-quality training data at scale, moving substantially beyond simple retrieval tasks to enable genuine long-range reasoning capabilities. (2) Stabilized Reinforcement Learning for Long-Context Training: To overcome the critical instability in long-context RL, we introduce task-balanced sampling with task-specific advantage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Tongyi-Zhiwen/QwenLong-L1.5-30B-A3Bmodel· 304 dl· ♡ 165304 dl♡ 165
- 🤗cyankiwi/QwenLong-L1.5-30B-A3B-AWQ-4bitmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗noctrex/QwenLong-L1.5-30B-A3B-MXFP4_MOE-GGUFmodel· 91 dl· ♡ 391 dl♡ 3
- 🤗Mungert/QwenLong-L1.5-30B-A3B-GGUFmodel· 177 dl· ♡ 1177 dl♡ 1
- 🤗cyankiwi/QwenLong-L1.5-30B-A3B-AWQ-8bitmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
