From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

Jinhua Wang; Biswa Sengupta

arXiv:2604.11518·cs.SE·April 14, 2026

From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

Jinhua Wang, Biswa Sengupta

PDF

TL;DR

This paper presents a methodology for translating a large production codebase from Rust to Python using LLMs, driven by benchmarks, resulting in a capable, extended AI agent with near-parity performance.

Contribution

It introduces a benchmark-driven, iterative translation process for large codebases, enabling continuous synchronization and feature extension in the target language.

Findings

01

Python port achieves near-parity on real-world tasks compared to Rust.

02

Benchmark-driven debugging outperforms static testing methods.

03

Python version offers a 15.9x code reduction with minimal performance loss.

Abstract

Cross-language migration of large software systems is a persistent engineering challenge, particularly when the source codebase evolves rapidly. We present a methodology for LLM-assisted continuous code translation in which a large language model translates a production Rust codebase (648K LOC, 65 crates) into Python (41K LOC, 28 modules), with public agent benchmarks as the objective function driving iterative refinement. Our subject system is Codex CLI, a production AI coding agent. We demonstrate that: (1) the Python port resolves 59/80 SWE-bench Verified tasks (73.8%) versus Rust's 56/80 (70.0%), and achieves 42.5% on Terminal-Bench versus Rust's 47.5%, confirming near-parity on real-world agentic tasks; (2) benchmark-driven debugging, revealing API protocol mismatches, environment pollution, a silent WebSocket failure mode, and an API 400 crash, is more effective than static…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.