InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang; Wei Zhang; Jiajun Wu; Junhang Cheng; Shawn Guo; Haowen Wang; Weicheng Gu; Yaxin Du; Joseph Li; Fanglin Xu; Yizhi Li; Lin Jing; Yuanbo Wang; Yuhan Gao; Ruihao Gong; Chuan Hao; Ran Tao; Aishan Liu; Tuney Zheng; Ganqu Cui; Zhoujun Li; Mingjie Tang; Chenghua Lin; Wayne Xin Zhao; Xianglong Liu; Ming Zhou; Bryan Dai; Weifeng Lv

arXiv:2603.16790·cs.SE·April 1, 2026

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang, Chenghua Lin

PDF

1 Repo

TL;DR

InCoder-32B is a large-scale code foundation model designed for industrial applications, integrating diverse domain knowledge and training techniques to improve performance in specialized scenarios.

Contribution

The paper introduces InCoder-32B, a 32-billion-parameter model trained with a novel multi-stage process for industrial code tasks, unifying multiple industrial domains.

Findings

01

Achieves competitive performance on general code benchmarks.

02

Establishes strong open-source baselines for industrial domains.

03

Effectively handles complex industrial reasoning tasks.

Abstract

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csjianyang/Industrial-Coder
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.