TL;DR
Orchard is an open-source framework that enables scalable, multi-domain agentic modeling with reusable primitives, achieving state-of-the-art results in coding, vision-language, and personal assistant tasks.
Contribution
The paper introduces Orchard, a lightweight, open-source environment framework that supports scalable agentic modeling and demonstrates its effectiveness across multiple domains.
Findings
Orchard-SWE achieves 67.5% on SWE-bench after SFT+RL, setting a new open-source state of the art.
Orchard-GUI attains success rates of 74.1%, 67.0%, and 64.0% on three benchmarks, outperforming proprietary systems.
Lightweight environment primitives enable effective training and evaluation across diverse agentic tasks.
Abstract
Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
