Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification
Nathaniel Pinckney, Chenhui Deng, Chia-Tung Ho, Yun-Da Tsai, Mingjie Liu, Wenfei Zhou, Brucek Khailany, Haoxing Ren

TL;DR
The CVDP benchmark provides a comprehensive, challenging dataset of Verilog design problems to evaluate and advance large language models and agents in hardware design and verification, highlighting current limitations.
Contribution
It introduces a large, realistic benchmark dataset with diverse tasks and evaluation methods, enabling systematic assessment of LLMs and agents in RTL design and verification.
Findings
State-of-the-art models achieve only 34% pass@1 on code generation.
Agentic tasks, especially RTL reuse and verification, are particularly challenging.
CVDP exposes significant gaps in current model capabilities for hardware design automation.
Abstract
We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification. CVDP includes 783 problems across 13 task categories, covering RTL generation, verification, debugging, specification alignment, and technical Q&A authored by experienced hardware engineers. Problems are offered in both non-agentic and agentic formats. The benchmark introduces more realistic and challenging contexts than prior work, with state-of-the-art models achieving no more than 34% pass@1 on code generation. Agentic tasksespecially those involving RTL reuse and verificationare particularly difficult. Evaluation uses open-source tools and model scoring infrastructure, with comprehension tasks assessed via BLEU and LLM-based judging. CVDP reveals substantial gaps in current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
