ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design
Zhongkai Yu, Chenyang Zhou, Yichen Lin, Hejia Zhang, Haotian Ye, Junxia Cui, Zaifeng Pan, Jishen Zhao, Yufei Ding

TL;DR
This paper introduces ChipBench, a comprehensive benchmark for evaluating large language models in AI-assisted chip design, highlighting current performance gaps and providing tools for future improvement.
Contribution
The paper presents a new benchmark with diverse tasks and realistic modules, revealing significant performance gaps of current LLMs and offering an automated toolbox for data generation.
Findings
State-of-the-art models achieve only around 30% accuracy in Verilog generation.
Current benchmarks show over 95% pass rates, indicating saturation.
The benchmark exposes substantial challenges in LLM performance for chip design tasks.
Abstract
While Large Language Models (LLMs) show significant potential in hardware engineering, current benchmarks suffer from saturation and limited task diversity, failing to reflect LLMs' performance in real industrial workflows. To address this gap, we propose a comprehensive benchmark for AI-aided chip design that rigorously evaluates LLMs across three critical tasks: Verilog generation, debugging, and reference model generation. Our benchmark features 44 realistic modules with complex hierarchical structures, 89 systematic debugging cases, and 132 reference model samples across Python, SystemC, and CXXRTL. Evaluation results reveal substantial performance gaps, with state-of-the-art Claude-4.5-opus achieving only 30.74\% on Verilog generation and 13.33\% on Python reference model generation, demonstrating significant challenges compared to existing saturated benchmarks where SOTA models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Natural Language Processing Techniques
