DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder
Jiaran Zhang, Luck Ma, Fanqi Wan, Di Qi, Xu Zhao, Jieyi Hou, Zhe Xie, Mengqiang Ren, Xin Wu, Zhewei Huang, Liangyu Chen, Qi Han, Xiangyu Zhang

TL;DR
DockSmith is an agentic Docker builder that enhances environment construction for software engineering tasks by exercising long-horizon reasoning, dependency management, and failure recovery, leading to state-of-the-art performance.
Contribution
Introducing DockSmith, a novel agentic Docker builder trained on large-scale trajectories, enabling scalable, reliable environment construction with broader agentic benefits.
Findings
Achieves 39.72% Fail-to-Pass on Multi-Docker-Eval
Reaches 58.28% Commit Rate on Docker tasks
Improves out-of-distribution performance on multiple benchmarks
Abstract
Reliable Docker-based environment construction is a dominant bottleneck for scaling execution-grounded training and evaluation of software engineering agents. We introduce DockSmith, a specialized agentic Docker builder designed to address this challenge. DockSmith treats environment construction not only as a preprocessing step, but as a core agentic capability that exercises long-horizon tool use, dependency reasoning, and failure recovery, yielding supervision that transfers beyond Docker building itself. DockSmith is trained on large-scale, execution-grounded Docker-building trajectories produced by a SWE-Factory-style pipeline augmented with a loop-detection controller and a cross-task success memory. Training a 30B-A3B model on these trajectories achieves open-source state-of-the-art performance on Multi-Docker-Eval, with 39.72% Fail-to-Pass and 58.28% Commit Rate. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
