TL;DR
KLong is an open-source LLM agent designed for extremely long-horizon tasks, utilizing trajectory-splitting SFT and progressive RL to enhance long-term problem-solving capabilities.
Contribution
The paper introduces a novel training pipeline combining trajectory-splitting SFT and progressive RL, enabling LLMs to better handle very long tasks.
Findings
KLong (106B) outperforms Kimi K2 Thinking (1T) by 11.28% on PaperBench.
KLong demonstrates superior performance and generalization on multiple long-horizon benchmarks.
The proposed methods effectively preserve context and extend task-solving capabilities.
Abstract
This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
