IndustryCode: A Benchmark for Industry Code Generation
Puyu Zeng, Zhaoxi Wang, Zhixu Duan, Liang Feng, Shaobo Wang, Cunxiang Wang, Jinghang Wang, Bing Zhao, Hu Wei, Linfeng Zhang

TL;DR
IndustryCode is a comprehensive benchmark spanning multiple industrial domains and languages, designed to evaluate the generalization and proficiency of large language models in complex industrial code generation tasks.
Contribution
It introduces the first multi-domain, multi-language industrial code benchmark with 579 sub-problems from 125 challenges, addressing limitations of existing single-domain benchmarks.
Findings
Claude 4.5 Opus achieved 68.1% accuracy on sub-problems.
The benchmark covers diverse fields like finance, aerospace, and remote sensing.
It includes multiple programming languages such as MATLAB, Python, C++, and Stata.
Abstract
Code generation and comprehension by Large Language Models (LLMs) have emerged as core drivers of industrial intelligence and decision optimization, finding widespread application in fields such as finance, automation, and aerospace. Although recent advancements have demonstrated the remarkable potential of LLMs in general code generation, existing benchmarks are mainly confined to single domains and languages. Consequently, they fail to effectively evaluate the generalization capabilities required for real-world industrial applications or to reflect the coding proficiency demanded by complex industrial scenarios. To bridge this gap, we introduce IndustryCode, the first comprehensive benchmark designed to span multiple industrial domains and programming languages. IndustryCode comprises 579 sub-problems derived from 125 primary industrial challenges, accompanied by rigorous problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
