CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization

Yang Zhao; Di Huang; Chongxiao Li; Pengwei Jin; Muxin Song; Yinan Xu; Ziyuan Nan; Mingju Gao; Tianyun Ma; Lei Qi; Yansong Pan; Zhenxing Zhang; Rui Zhang; Xishan Zhang; Zidong Du; Qi Guo; Xing Hu

arXiv:2407.10424·cs.PL·May 13, 2025·5 cites

CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization

Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Muxin Song, Yinan Xu, Ziyuan Nan, Mingju Gao, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

PDF

Open Access 10 Models

TL;DR

This paper introduces CodeV, an open-source LLM series for HDL generation, leveraging multi-level summarization and a novel fine-tuning method to improve code quality and versatility across HDL languages and tasks.

Contribution

It presents a new fine-tuning pipeline combining multi-level summarization and Chat-FIM-Tag supervision, enabling LLMs to generate HDL code effectively from natural language.

Findings

01

CodeV-All outperforms previous models on VerilogEval.

02

The pipeline improves HDL code quality and task versatility.

03

CodeV models support multiple HDL languages and tasks.

Abstract

The design flow of processors, particularly in hardware description languages (HDL) like Verilog and Chisel, is complex and costly. While recent advances in large language models (LLMs) have significantly improved coding tasks in software languages such as Python, their application in HDL generation remains limited due to the scarcity of high-quality HDL data. Traditional methods of adapting LLMs for hardware design rely on synthetic HDL datasets, which often suffer from low quality because even advanced LLMs like GPT perform poorly in the HDL domain. Moreover, these methods focus solely on chat tasks and the Verilog language, limiting their application scenarios. In this paper, we observe that: (1) HDL code collected from the real world is of higher quality than code generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing HDL code rather than generating it. (3) An explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Discriminative Fine-Tuning · Focus · GPT · Cosine Annealing · Label Smoothing · Linear Layer