DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder

Jiaran Zhang; Luck Ma; Fanqi Wan; Di Qi; Xu Zhao; Jieyi Hou; Zhe Xie; Mengqiang Ren; Xin Wu; Zhewei Huang; Liangyu Chen; Qi Han; Xiangyu Zhang

arXiv:2602.00592·cs.AI·April 29, 2026

DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder

Jiaran Zhang, Luck Ma, Fanqi Wan, Di Qi, Xu Zhao, Jieyi Hou, Zhe Xie, Mengqiang Ren, Xin Wu, Zhewei Huang, Liangyu Chen, Qi Han, Xiangyu Zhang

PDF

TL;DR

DockSmith is an agentic Docker builder that enhances environment construction for software engineering tasks by exercising long-horizon reasoning, dependency management, and failure recovery, leading to state-of-the-art performance.

Contribution

Introducing DockSmith, a novel agentic Docker builder trained on large-scale trajectories, enabling scalable, reliable environment construction with broader agentic benefits.

Findings

01

Achieves 39.72% Fail-to-Pass on Multi-Docker-Eval

02

Reaches 58.28% Commit Rate on Docker tasks

03

Improves out-of-distribution performance on multiple benchmarks

Abstract

Reliable Docker-based environment construction is a dominant bottleneck for scaling execution-grounded training and evaluation of software engineering agents. We introduce DockSmith, a specialized agentic Docker builder designed to address this challenge. DockSmith treats environment construction not only as a preprocessing step, but as a core agentic capability that exercises long-horizon tool use, dependency reasoning, and failure recovery, yielding supervision that transfers beyond Docker building itself. DockSmith is trained on large-scale, execution-grounded Docker-building trajectories produced by a SWE-Factory-style pipeline augmented with a loop-detection controller and a cross-task success memory. Training a 30B-A3B model on these trajectories achieves open-source state-of-the-art performance on Multi-Docker-Eval, with 39.72% Fail-to-Pass and 58.28% Commit Rate. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.