Orchard: An Open-Source Agentic Modeling Framework

Baolin Peng; Wenlin Yao; Qianhui Wu; Hao Cheng; Xiao Yu; Rui Yang; Tao Ge; Alessandro Sordoni; Xingdi Yuan; Yelong Shen; Pengcheng He; Tong Zhang; Zhou Yu; Jianfeng Gao

arXiv:2605.15040·cs.AI·May 22, 2026

Orchard: An Open-Source Agentic Modeling Framework

Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Xiao Yu, Rui Yang, Tao Ge, Alessandro Sordoni, Xingdi Yuan, Yelong Shen, Pengcheng He, Tong Zhang, Zhou Yu, Jianfeng Gao

PDF

1 Repo

TL;DR

Orchard is an open-source framework that enables scalable, multi-domain agentic modeling with reusable primitives, achieving state-of-the-art results in coding, vision-language, and personal assistant tasks.

Contribution

The paper introduces Orchard, a lightweight, open-source environment framework that supports scalable agentic modeling and demonstrates its effectiveness across multiple domains.

Findings

01

Orchard-SWE achieves 67.5% on SWE-bench after SFT+RL, setting a new open-source state of the art.

02

Orchard-GUI attains success rates of 74.1%, 67.0%, and 64.0% on three benchmarks, outperforming proprietary systems.

03

Lightweight environment primitives enable effective training and evaluation across diverse agentic tasks.

Abstract

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/Orchard
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.