CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Muxin Song, Yinan Xu, Ziyuan Nan, Mingju Gao, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

TL;DR
This paper introduces CodeV, an open-source LLM series for HDL generation, leveraging multi-level summarization and a novel fine-tuning method to improve code quality and versatility across HDL languages and tasks.
Contribution
It presents a new fine-tuning pipeline combining multi-level summarization and Chat-FIM-Tag supervision, enabling LLMs to generate HDL code effectively from natural language.
Findings
CodeV-All outperforms previous models on VerilogEval.
The pipeline improves HDL code quality and task versatility.
CodeV models support multiple HDL languages and tasks.
Abstract
The design flow of processors, particularly in hardware description languages (HDL) like Verilog and Chisel, is complex and costly. While recent advances in large language models (LLMs) have significantly improved coding tasks in software languages such as Python, their application in HDL generation remains limited due to the scarcity of high-quality HDL data. Traditional methods of adapting LLMs for hardware design rely on synthetic HDL datasets, which often suffer from low quality because even advanced LLMs like GPT perform poorly in the HDL domain. Moreover, these methods focus solely on chat tasks and the Verilog language, limiting their application scenarios. In this paper, we observe that: (1) HDL code collected from the real world is of higher quality than code generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing HDL code rather than generating it. (3) An explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yang-z/CodeV-QW-7Bmodel· 3 dl· ♡ 33 dl♡ 3
- 🤗yang-z/CodeV-CL-7Bmodel· 12 dl· ♡ 112 dl♡ 1
- 🤗yang-z/CodeV-DS-6.7Bmodel· 11 dl· ♡ 311 dl♡ 3
- 🤗RichardErkhov/yang-z_-_CodeV-CL-7B-ggufmodel· 8 dl8 dl
- 🤗RichardErkhov/yang-z_-_CodeV-DS-6.7B-ggufmodel
- 🤗RichardErkhov/yang-z_-_CodeV-DS-6.7B-8bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/yang-z_-_CodeV-QW-7B-4bitsmodel
- 🤗RichardErkhov/yang-z_-_CodeV-QW-7B-8bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/yang-z_-_CodeV-QW-7B-awqmodel· 1 dl1 dl
- 🤗yang-z/CodeV-All-CLmodel· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Discriminative Fine-Tuning · Focus · GPT · Cosine Annealing · Label Smoothing · Linear Layer
