ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

Zhongkai Yu; Chenyang Zhou; Yichen Lin; Hejia Zhang; Haotian Ye; Junxia Cui; Zaifeng Pan; Jishen Zhao; Yufei Ding

arXiv:2601.21448·cs.AI·February 3, 2026

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

Zhongkai Yu, Chenyang Zhou, Yichen Lin, Hejia Zhang, Haotian Ye, Junxia Cui, Zaifeng Pan, Jishen Zhao, Yufei Ding

PDF

Open Access

TL;DR

This paper introduces ChipBench, a comprehensive benchmark for evaluating large language models in AI-assisted chip design, highlighting current performance gaps and providing tools for future improvement.

Contribution

The paper presents a new benchmark with diverse tasks and realistic modules, revealing significant performance gaps of current LLMs and offering an automated toolbox for data generation.

Findings

01

State-of-the-art models achieve only around 30% accuracy in Verilog generation.

02

Current benchmarks show over 95% pass rates, indicating saturation.

03

The benchmark exposes substantial challenges in LLM performance for chip design tasks.

Abstract

While Large Language Models (LLMs) show significant potential in hardware engineering, current benchmarks suffer from saturation and limited task diversity, failing to reflect LLMs' performance in real industrial workflows. To address this gap, we propose a comprehensive benchmark for AI-aided chip design that rigorously evaluates LLMs across three critical tasks: Verilog generation, debugging, and reference model generation. Our benchmark features 44 realistic modules with complex hierarchical structures, 89 systematic debugging cases, and 132 reference model samples across Python, SystemC, and CXXRTL. Evaluation results reveal substantial performance gaps, with state-of-the-art Claude-4.5-opus achieving only 30.74\% on Verilog generation and 13.33\% on Python reference model generation, demonstrating significant challenges compared to existing saturated benchmarks where SOTA models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Natural Language Processing Techniques